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preceded by an introduction, which is intended as a historical contextualization of 
the field of Information Economics. A translation of the introduction in French is 
available after the English version. The bibliography for each of the papers is located 
at the end of each chapter, whereas the appendices are all located in a common 


section at the end of the thesis. 
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Introduction 


English version 


Out of the many ways in which the 20th century might be remembered in the 
future, perhaps the most distinctive one is as the century in which the concept 
of information acquired a central status in the way human societies perceive the 
world and organize themselves. It was over that century that humanity discovered 
that all life forms rely on biological information stored in nucleic acids such as the 
DNA, or that uncertainty is a fundamental part of physical reality at the quantum 
scale. It was also over that century that the development of the personal computer 
and the internet revolutionized the way information is collected, processed and 
distributed, fundamentally changing how modern societies function. In the same way 
that thermodynamics and its concepts were the central categories in the epistemic 
regime prevalent during the industrial revolutions of the 18th and 19th centuries, 
information and its concepts became central epistemic categories during the digital 
revolutions of the 20th and 21st centuries. 

The development of economic theory since the mid-20th century has echoed 
this trend. Work by Akerlof, Stigler, Stiglitz! and others rendered transparent how 
imperfect and asymmetrically distributed information play a central role in the deter- 
mination of economic phenomena, demonstrating how previously analyzed equilibria 
could be fundamentally altered by even small changes in information. Information 
became not only one of the “fundamental particles” out of which microeconomic 
theories are built, alongside preferences, institutions and technology, but it also came 
to be regarded as a key commodity in any economy, sparking the beginning of a 
research agenda aimed at understanding how it is acquired, shared and used by 
economic agents. 

This was not without technical challenges. Happily, language to talk about such 


concepts was already being developed. Notions related to the measurement and to 


Isee Akerlof (1970); Stigler (1961); Spence (1973); Stiglitz (1975); Rothschild and Stiglitz (1976); 
Milgrom and Roberts (1986). 
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ways of ordering information were developed by Shannon and Blackwell (Shannon, 
1948; Blackwell, 1951), whereas equilibrium refinements accounting for incomplete 


information had been proposed in Harsanyi (1968). 


Information Economics. It is worthwhile to briefly present some of the main 
strands of the literature in Information Economics, and how they relate to that broad 
agenda concerning the “life-cycle” of information. 

The process of information acquisition is the subject of rational inattention models 
(Sims, 2003; Matéjka and McKay, 2015), in which agents facing a certain decision 
problem choose what information to acquire given some cost”. Another literature 
exploring topics in information acquisition is the strategic experimentation literature 
(Bolton and Harris, 1999; Keller et al., 2005), in which agents need to choose how 
much of their resources (for instance their time) should be allocated on an uncertain 
alternative relative to another certain alternative. The strategic element comes into 
play when you consider that information from one agent might somehow flow to 
another agent, which modifies their incentives to explore the uncertain alternative 
and creates a free-rider problem. 

The theme of information flowing between individuals is also present in the social 
learning literature (Banerjee, 1992; Smith and Sgrensen, 2000), which studies how 
dispersed information is aggregated when agents can observe (and thus infer from) 
the actions taken by other agents. Typically information fails to be fully aggregated 
in such settings because agents might find it optimal to just take the same action 
as they see other agents taking, instead of conditioning their action on their own 
information. 

Of course, one central way through which information flows is through communi- 
cation. This is the focus of both the cheap talk (Crawford and Sobel, 1982) and the 
literature on verifiable message models (Grossman, 1981; Milgrom, 1981). Cheap talk 
models consider settings in which the agent that sends information is unconstrained, 
being able to choose any message costlessly (that is, being able to lie). Verifiable 
message models, on the other hand, study equilibria in situations in which the sender 
is able to choose how much of its information to transmit, but cannot lie. 

Another relevant line of research concerns the way information is used by agents. 
While the bayesian paradigm provides a natural way to model the interpretation of 


evidence, experimental research has pointed out a number of ways in which people 


?The cost of acquiring (or processing) some piece of information is typically considered to be 
proportional to the reduction in the Shannon entropy of the belief that it causes. Some limitations 
of the usage of these types of costs are discussed in Angeletos and Sastry (2019); Morris and Yang 
(2021); Nieuwerburgh and Veldkamp (2010). 
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might interpret evidence differently than what is considered in the bayesian model. 
Theoretical literature has mostly focused on studying how these different biases affect 
how information is translated into behavior, but has so far devoted less attention 
to another of its implications: how they shape incentives guiding how information 
is produced and shared. The two last chapters of this thesis tackle this issue, with 
each chapter considering a different deviation from the standard bayesian model. 
A final strand of the literature that is important to mention is the one on 
Information Design (Kamenica and Gentzkow, 2011; Bergemann and Morris, 2019), 
which aims at understanding which informational environments are optimal under 
some objective, in different settings. The three papers contained in this thesis 
mainly relate to this literature. The first chapter, focused on a theme present in 
the digital economy, studies price discrimination through the lenses of data-driven 
market segmentation. It is concerned with market segmentations that are optimal 
for consumers and that prioritize poorer consumers. The second and third chapters 
are at the frontier of information design and behavioral economics: each of them 
explores the impact of a different deviation from the standard bayesian model into 
the design of information structures: the second chapter considers the impact of 
“wishful thinking” - the tendency of individuals of distorting their beliefs towards more 
optimistic scenarios, while the third chapter considers receivers with heterogeneous 
levels of understanding of the information conveyed. Below is a brief presentation of 


the themes present in each of the chapters. 


Price Discrimination with Redistributive Concerns. Price discrimination 
is the subject of an extensive literature in Economics, dating back to Pigou (1920) 
and Robinson (1933). Historically, this literature would consider markets in which 
consumers were somehow exogenously segmented - for instance because they would 
be distributed geographically in a manner that would somehow reflect their character- 
istics -. Given some fixed segmentation of consumers, economists would try to infer 
conditions on the segmentation such that welfare (both producer’s and consumer’s) 
would be higher or lower relative to the case with an unsegmented market. 
Recently, however, there has been a renewal in the interest devoted to this practice. 
This was prompted both by the increased practical relevance of this topic since the 
rise of digital markets, in which platforms possessing rich amounts of consumer data 
are able to flexibly segment consumers, as well as by developments in economic 
theory that made us more equipped to think analytically about this issue. Instead 
of thinking of consumer segmentations as exogenously given, this recent literature 


reasons at the space of all possible segmentations of consumers, allowing us to think 
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about what welfare outcomes are feasible in general (Bergemann et al., 2015) and to 
pin down segmentations that have a particular normative or positive appeal. 

The aim of this paper is to study market segmentations aimed at benefitting 
consumers by lowering the prices they pay, and that prioritize poorer consumers 
in the sense that we are especially concerned by segmentations that will lower 
more the prices paid by poorer consumers. We show that while such redistributive 
segmentations are efficient (they maximize total surplus), they might not maximize 
aggregate consumer surplus. Instead, in the process of increasing the surplus of poorer 
consumers, some of the surplus that could potentially belong to some consumers 
ends up with the firm. 

The results in this chapter characterize conditions on the aggregate composition 
of consumers such that this is true, and draws characteristics of redistributive 


segmentations. 


Persuading a Wishful Thinker. The second chapter of this thesis is concerned 
with how biases on the receiving end of information affect incentives for information 
production and disclosure. We consider a model in which an interested sender devises 
an information structure to inform a biased receiver. The receiver is biased in that it 
distorts the informational content of the signal it observes, systematically holding 
beliefs that are more optimistic given its preferences. 

We discuss the way in which such bias causes preferences to interact with beliefs 
and establish conditions for such biased receivers to be harder or easier to persuade. 
We use the insights from this model to illustrate why information campaigns might 
be ineffective at inducing preventive health behavior, how financial advisors might 
find it easier to sell riskier assets and how strategic information disclosure in elections 


might lead to increased polarization. 


Text and Subtext. The third chapter, entitled “Text and Subtext” is devoted 
to analyzing information as a multi-layer concept. The basic idea explored in the 
chapter is that a piece of information might have varying degrees of depth, depending 
on the person interpreting it. 

The idea of depth in a piece of information is one that is culturally familiar. 
Enlightenment philosophers were explicit about the distinction between the exoteric 
- the part of a text that was commonly understood - and esoteric - the aspects that 
could only be grasped by some - reading of philosophical texts, with authors such as 
Leibniz explicitly mentioning the deliberate usage of both modes as a strategy to 
make metaphysical writings acceptable to a more general (and, in his time, dogmatic) 


audience while still conveying the intended message to selected readers. A more recent 
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illustration of the strategic use of multi-layered information is in the phenomenon 
known as dog whistling: the usage, usually in political speeches, of coded language 
aimed at signaling something privately to some listeners without antagonizing others. 

The aim of this chapter is to translate these ideas into the language of modern 
information design. We draw the joint distributions of beliefs that can be attained 
by any information structure when the audience varies in the depth that they can 
assess information, and draw a procedure that retrieves the value that a sender can 


obtain by exploiting such heterogeneity in understanding. 
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Version francaise 


Parmi les nombreuses fagons dont le XX° siécle pourrait étre considéré a l’avenir, 
la plus caractéristique est peut-étre celle du siécle ot le concept d’information a 
acquis une place centrale dans la maniére dont les sociétés humaines perg¢oivent le 
monde et s’organisent. C’est au cours de ce siécle que l’humanité a découvert que 
toutes les formes de vie dépendent de l’information biologique stockée dans les acides 
nucléiques tels que lADN, ou que l’incertitude est une partie fondamentale de la 
réalité physique a l’échelle quantique. C’est également au cours de ce siécle que le 
développement de l’ordinateur personnel et de l’Internet a révolutionné la maniére 
dont l’information est collectée, traitée et diffusée, changeant fondamentalement le 
fonctionnement des sociétés modernes. De la méme maniére que la thermodynamique 
et ses concepts ont été les catégories centrales du régime épistémique prévalent durant 
les révolutions industrielles des X VIII® et XIX°® siécles, l'information et ses concepts 
sont devenus les catégories épistémiques centrales lors des révolutions numériques 
des XX° et XXI® siécles. 

Le développement de la théorie économique depuis le milieu du XX°® siécle a 
suivi cette tendance. Les travaux d’Akerlof, Stigler, Stiglitz? et d’autres auteurs, ont 
mis en évidence le réle central de l’information imparfaite et distribuée de maniére 
asymétrique dans la détermination des phénoménes économiques. Ils ont démontré 
comment les équilibres précédemment analysés pouvaient étre profondément modifiés 
par de légéres variations de l’information. L’information est ainsi devenue |’un des 
éléments fondamentaux sur lesquels reposent les théories microéconomiques, aux 
cétés des préférences, des institutions et de la technologie. Elle a également été pergue 
comme un élément clé dans toute économie, donnant naissance 4 un programme de 
recherche visant 4 comprendre comment elle est acquise, partagée et exploitée par 
les acteurs économiques. 

Cela n’a pas été sans soulever des défis techniques. Fort heureusement, un 
vocabulaire adapté a de tels concepts était déja en cours d’élaboration. Les notions 
relatives 4 la quantification et 4 la comparaison des structures d’information ont été 
développées par Shannon et Blackwell (Shannon, 1948; Blackwell, 1951), alors que 
des concepts d’équilibre prenant en compte l’information incompléte ont été définis 


par Harsanyi (1968). 


L’Economie de l’information. I] est utile de présenter briévement certains des 


axes majeurs de la littérature en économie de l’information et leur lien avec le vaste 


3voir Akerlof (1970); Stigler (1961); Spence (1973); Stiglitz (1975); Rothschild and Stiglitz (1976); 
Milgrom and Roberts (1986). 
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programme de recherche relatif au « cycle de vie » de l’information. 

Le processus d’acquisition de l’information est l’objet des modéles d’inattention 
rationnelle (Sims, 2003; Matéjka and McKay, 2015). Dans ceux-ci, les agents confron- 
tés a un probléme de décision choisissent quelles informations obtenir étant donné leur 
cotit d’acquisition*. Un autre domaine de recherche lié 4 l’acquisition d’information 
concerne l’expérimentation stratégique (Bolton and Harris, 1999; Keller et al., 2005), 
ot les agents doivent décider quelle proportion de leurs ressources (par exemple, 
leur temps) doit étre allouée a une alternative incertaine pluté6t qu’a une alternative 
certaine. L’aspect stratégique intervient lorsque l’on considére que l’information d’un 
agent peut étre transmise d’une maniére ou d’une autre a un autre agent, modifiant 
ainsi leurs incitations 4 explorer l’alternative incertaine et créant un probleme de 
passager clandestin. 

Naturellement, la communication est l'un des moyens fondamentaux par lesquels 
Vinformation se propage. Cela est au centre des modéles de communication sans cotit 
(Crawford and Sobel, 1982) et de la littérature sur les modéles de communication 
certifiable (Grossman, 1981; Milgrom, 1981). Les modéles de communication sans cotit 
considérent des situations ot agent qui envoie l’information n’est pas contraint et 
peut transmettre n’importe quel message (c’est-a-dire qu’il peut mentir). En revanche, 
les modéles de communication certifiable étudient les situations ot l’expéditeur peut 
décider quelle partie de ses informations transmettre, mais ne peut pas mentir. 

Un autre domaine de recherche pertinent concerne la maniére dont les agents 
exploitent l’information. Bien que le paradigme bayésien propose une approche 
naturelle pour modéliser Vinterprétation de l’information, la recherche expérimentale 
a identifié plusieurs fagons dont les individus peuvent interpréter l'information 
différemment de ce qui est prévu dans le modéle bayésien. La littérature théorique s’est 
principalement concentrée sur |’étude de l’impact de ces divers biais sur la maniére 
dont l’information se traduit en comportement, mais a accordé moins d’attention 
a une autre de ses implications : comment ces biais influencent les incitations qui 
guident la production et la diffusion de l’information. Les deux derniers chapitres 
de cette thése traitent de cette problématique, chacun examinant une déviation 
différente par rapport au modéle bayésien standard. 

Un dernier aspect de la littérature qu’il est important de mentionner est celui de la 
conception de l’information (Kamenica and Gentzkow, 2011; Bergemann and Morris, 


2019), qui vise 4 comprendre quels environnements informationnels sont optimaux 


‘Le cofit d’acquisition (ou de traitement) d’une information est généralement considéré comme 
proportionnel a la réduction de l’entropie de Shannon. Certaines limitations de l'utilisation de 
ces types de coiits sont discutées dans Angeletos and Sastry (2019); Morris and Yang (2021); 
Nieuwerburgh and Veldkamp (2010). 
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selon un objectif donné, dans différents contextes. Les trois articles contenus dans 
cette thése se rattachent principalement 4 cette littérature. Le premier chapitre, axé 
sur un théme présent dans l’économie numérique, étudie la discrimination tarifaire a 
travers les prismes de la segmentation de marché basée sur les données. I] porte sur 
les segmentations de marché optimales pour les consommateurs et qui privilégient les 
consommateurs les plus pauvres. Les deuxiéme et troisiéme chapitres se situent a la 
frontiére entre la conception de l’information et l'économie comportementale : chacun 
d’eux explore impact d’une déviation différente du modéle bayésien standard sur 
la conception des structures d’information : le deuxiéme chapitre examine |’impact 
des « croyances motivées » — la tendance qu’ont les individus 4 déformer leurs 
croyances dans la direction de leurs désirs — tandis que le troisiéme chapitre considére 
les destinataires ayant des niveaux de compréhension hétérogénes de l’information 


transmise. Voici une bréve présentation des thémes abordés dans chacun des chapitres. 


Discrimination tarifaire et préoccupations redistributives. La discrimina- 
tion tarifaire fait objet d’une vaste littérature en économie, remontant 4 Pigou 
(1920) et Robinson (1933). Historiquement, cette littérature considérait des marchés 
dans lesquels les consommateurs étaient segmentés de maniére exogéne, par exemple 
de maniére géographique. Etant donnée une segmentation des consommateurs, les 
économistes ont cherché a déterminer les conditions dans lesquelles le bien-étre (a la 
fois du producteur et du consommateur) est supérieur ou inférieur par rapport au 
cas d’un marché non segmenté. 

Récemment, toutefois, ’intérét porté a cette pratique a connu un regain. Cela a été 
di 4 la fois 4 l’importance accrue de ce sujet dans le contexte des marchés numériques, 
ot les plateformes disposant de riches données sur les consommateurs sont en mesure 
de segmenter les consommateurs de maniére flexible, ainsi qu’aux développements de 
la théorie économique ayant permis de réfléchir de maniére plus analytique a cette 
question. Au lieu de considérer les segmentations des consommateurs comme une 
donnée exogéne, la littérature récente prend l’ensemble des segmentations possibles 
des consommateurs comme une variable de choix, ce qui nous permet de réfléchir 
aux résultats possibles en termes de bien-étre en général (Bergemann et al., 2015) et 
identifier les segmentations ayant un attrait normatif ou positif particulier. 

L’objectif de cet article est d’examiner les segmentations de marché qui visent a 
favoriser les consommateurs en diminuant les prix qu’ils paient, tout en accordant la 
priorité aux consommateurs les plus pauvres, étant donné que nous nous intéressons 
spécifiquement aux segmentations qui réduiront davantage les prix pour ces derniers. 


Nous montrons que si de telles segmentations redistributives sont efficientes au sens 
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de Pareto (elles maximisent le surplus total), elles peuvent ne pas maximiser le 
surplus des consommateurs. Au lieu de cela, dans le processus d’augmentation du 
surplus des consommateurs les plus pauvres, une partie du surplus qui pourrait 
potentiellement revenir a certains consommateurs se retrouve dans les mains de la 
firme. 

Les résultats de ce chapitre caractérisent les conditions relatives 4 la composition 
globale des consommateurs pour lesquelles cela est vrai et mettent en évidence les 


caractéristiques des segmentations redistributives. 


Persuasion et croyances motivées. Le deuxiéme chapitre de cette thése examine 
comment les biais chez les destinataires de information influencent les incitations 
a produire et divulguer des informations. Nous étudions un modéle dans lequel un 
expéditeur concoit une structure d’information afin de persuader un destinataire 
biaisé. Ce dernier est biaisé dans la mesure oti il déforme le contenu informationnel 
du signal qu’il recoit, en maintenant systématiquement des croyances allant dans le 
sens de ses préférences. 

Nous analysons comment ce biais provoque une interaction entre les préférences et 
les croyances et déterminons les conditions dans lesquelles ces destinataires biaisés sont 
plus difficiles ou plus faciles 4 convaincre. Nous nous appuyons sur les enseignements 
de ce modeéle pour illustrer pourquoi les campagnes d’information pourraient ne pas 
réussir A encourager un comportement préventif en matiére de santé, comment les 
conseillers financiers pourraient trouver plus aisé de vendre des actifs plus risqués et 
comment la divulgation d’information stratégique lors des élections pourrait conduire 


a une polarisation accrue. 


Texte et sous-texte. Le troisiéme chapitre, intitulé « Texte et sous-texte », se 
consacre a l’analyse de l'information en tant que concept 4 plusieurs niveaux. L’idée 
principale abordée dans ce chapitre est qu’une information peut présenter différents 
degrés de profondeur, selon la personne qui l’interpréte. 

La notion de profondeur de l’information est culturellement familiére. Les philo- 
sophes des Lumiéres établissaient clairement la distinction entre la lecture exotérique 
— la partie d’un texte communément comprise — et ésotérique — les aspects acces- 
sibles seulement 4 certains — des textes philosophiques. Des auteurs tels que Leibniz 
mentionnaient explicitement l’usage délibéré des deux modes comme stratégie pour 
rendre les écrits métaphysiques acceptables pour un public plus large (et, a son 
époque, dogmatique), tout en transmettant le message voulu aux lecteurs choisis. 
Un exemple plus récent d’utilisation stratégique d’informations a4 plusieurs niveaux 


est le phénoméne appelé « dog whistling » : ’emploi, généralement dans les discours 
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politiques, d’un langage codé ayant pour but de communiquer quelque chose en privé 
a certains auditeurs sans en froisser d’autres. 

Le but de ce chapitre est de transposer ces idées dans le langage formel de la 
conception d’information. Nous établissons les distributions conjointes de croyances 
pouvant étre atteintes par n’importe quelle structure d’information lorsque le public 
présente une diversité dans sa capacité a évaluer la profondeur de l'information. Nous 
proposons également une procédure permettant de déterminer le gain espéré qu’un 


émetteur peut obtenir en tirant parti d’une telle hétérogénéité de compréhension. 
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Chapter 1 


Price Discrimination with 


Redistributive Concerns 


Abstract 


Consumer data can be used to sort consumers into different market 
segments, allowing a monopolist to charge different prices at each seg- 
ment. We study consumer-optimal segmentations with redistributive 
concerns, i.e., that prioritize poorer consumers. Such segmentations are 
efficient but may grant additional profits to the monopolist, compared 
to consumer-optimal segmentations with no redistributive concerns. We 
characterize the markets for which this is the case and provide a procedure 
for constructing optimal segmentations given a strong redistributive mo- 
tive. For the remaining markets, we show that the optimal segmentation 
is surprisingly simple: it generates one segment with a discount price and 
one segment with the same price that would be charged if there were no 


segmentation. 


°This chapter is joint work with Alexis Ghersengorin and Victor Augias. We thank Eduardo 
Perez-Richet for his guidance on this project. We also thank Matthew Elliott, Jeanne Hagenbach, 
Emeric Henry, Emir Kamenica, Frédéric Koessler, Shengwu Li, Franz Ostrizek, Nikhil Vellodi, 
Colin Stewart, and seminar participants at Sciences Po, Paris School of Economics, Northwestern 
University, University of Konstanz, CUNEF, University of Rome “Tor Vergata”, University of 
Barcelona, University of Amsterdam and WU Vienna for helpful discussions. This project has 
received funding from the European Research Council (ERC) under the European Union’s Horizon 
2020 research and innovation programme (grant agreement 850996 — MOREV and 101001694 — 
IMEDMC) 
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1.1 Introduction 


Consumers are continuously leaving traces of their identities on the internet, be it 
through social media activity, search-engine utilization, online-purchasing and so 
on. The vast amount of consumer data that is generated and collected has acquired 
the status of a highly-valued good, as it allows firms to tailor advertisements and 
prices to different consumers. In practice, the availability of consumer data segments 
consumers: observing that a given consumer has certain characteristics allows firms 
to fine-tune how they interact with people that share those characteristics. Adjusting 
how coarse-grained the information available about consumers is impacts how they 
will be segmented, what sort of digital market interactions they will have and what 
prices they will pay. This suggests room for regulatory oversight. 

As shown by Bergemann et al. (2015), consumer segmentation and price discrim- 
ination can induce a wide range of welfare outcomes. It can not only be used to 
increase social surplus—by creating segments with prices that allow more consumers 
to buy—, but can also be performed in a way that ensures that all created surplus 
accrues to consumers — that is, that maximizes consumer surplus. This is done by 
creating segments that pool together consumers with high and low willingness to 
pay, thus allowing higher willingness to pay consumers to benefit from lower prices. 
However, an important aspect of price discrimination that remains overlooked by the 
literature is its distributive effect: since different consumers pay different prices, this 
practice defines how surplus is distributed across consumers, raising questions about 
how it can benefit poorer consumers relative to richer ones. Indeed, if willingness to 
pay and wealth are positively related, segmentations that maximize total consumer 
surplus tend to benefit richer consumers. 

In this paper we provide a normative analysis of the distributive impacts of market 
segmentation. Our aim is to study how this practice impacts different consumers 
and how it should be performed under the objective of increasing consumer welfare 
while prioritizing poorer consumers. Our results draw qualitative characteristics of 
segmentations that achieve this goal, which can be used to inform future regulation. 
Importantly, our analysis also shows that the prioritization of poorer consumers can 
be inconsistent with the maximization of total consumer surplus: raising the surplus 
of poorer consumers may only be possible while granting additional profits to the 
producer, at the expense of richer consumers. 

We consider a setting in which a monopolist sells a good on a market composed 
of heterogeneous consumers, each of whom can consume at most one unit and is 


characterized by their willingness to pay for the good. A social planner can provide 
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information about consumers’ willingness to pay to the monopolist. The information 
provision strategy effectively divides the aggregate pool of consumers into different 
segments, each of which can be priced differently by the monopolist. The social plan- 
ner’s objective is to maximize a weighted sum of consumers’ surplus. As in Dworczak 
et al. (2021), we consider weights that are decreasing on the consumer’s willingness 
to pay, capturing the notion of a redistributive motive under the assumption that 
consumers with higher willingness to pay are on average richer than those with lower 
willigness to pay. 

We first establish that optimal segmentations are Pareto efficient, such that 
satisfying a redistributive objective does not come at the expense of social sur- 
plus. Bergemann et al. (2015) show that, in the absence of redistributive concerns, 
consumer-optimal segmentations do not strictly benefit the monopolist: all of the 
surplus created by the segmentation accrues to consumers. In contrast, we show 
that once redistributive preferences are considered, consumer-optimal segmentations 
may imply additional profits to the monopolist. This happens because increasing the 
surplus of poor consumers is done by pooling them with even poorer consumers, such 
that they can benefit from lower prices. In doing so, richer consumers become more 
representative in other segments, which might increase the price they pay. We char- 
acterize the set of markets for which this is the case and denote them as rent markets. 
For no-rent markets, on the contrary, any redistributive objective can be met while 
still maximizing total consumer surplus. In this case, our analysis selects one among 
the many consumer-optimal segmentations established by Bergemann et al. (2015). 
These insights are illustrated through a three-type example in section 1.3. 

Our analysis also provides insights on how to construct optimal segmentations. 
We show that, in no-rent markets, consumer-optimal segmentations with redistribu- 
tive concerns exhibit a stunningly simple form, simply dividing consumers into two 
segments: one where the price is the same that would be charged under no segmenta- 
tion and one with a discount price. In rent markets, we show that consumer-optimal 
segmentations under sufficiently strong redistributive preferences divide consumers 
into contiguous segments based on their willingness to pay, having consumers with 
the same willingness to pay belong to at most two different segments. This allows 
us to construct a procedure that generates consumer-optimal segmentations under 


strong redistributive preferences, which is discussed in section 1.4.2. 


1.1.1 Related Literature 


Third-degree price discrimination and its welfare effects are the subject of an extensive 


literature. Early analysis (Pigou, 1920; Robinson, 1933) and subsequent development 
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(Schmalensee, 1981; Varian, 1985) considered exogenously fixed market segmentations 
and studied conditions under which such segmentations would increase or decrease 
total surplus. 

This literature has recently undergone a transformation, prompted by both 
technical innovations in microeconomic theory and the change in character of the 
practice of price discrimination brought about by the ascent of digital markets. 
Recent developments incorporate an information design approach to study the welfare 
impacts of third-degree price discrimination over all possible market segmentations, 
rather than taking a segmentation as exogenously fixed. Bergemann et al. (2015) 
analyze a setting with a monopolist selling a single good and characterize attainable 
pairs of consumer and producer surplus, showing that any distribution of total surplus 
over consumers and producer that guarantee at least the uniform-price profit for 
the producer is attainable. In particular, they show that there are typically many 
consumer-optimal segmentations of a given market. Their analysis has been extended 
to multi-product settings by Haghpanah and Siegel (2022a,b) and to imperfect 
competition settings by Elliott et al. (2021) and Ali et al. (2022). Hidir and Vellodi 
(2020) study market segmentation in a setting where the monopolist can offer one 
from a continuum of goods to each consumer, such that consumers, upon disclosing 
their information, face a trade-off between being offered their best option and having 
to pay a fine-tuned price. Finally, Roesler and Szentes (2017) and Ravid et al. (2022) 
study the inverse problem of information design to a buyer who is uncertain about 
the value of a good. Our paper differs from these by focusing on how surplus is 
distributed across consumers, and by studying consumer-optimal segmentations 
when different consumers are assigned different welfare weights. We show that, 
once distributional preferences are taken into account, optimal segmentations might 
not coincide with consumer-optimal segmentations under uniform welfare weights. 
When they do, our analysis selects one among the many direct consumer-optimal 
segmentations established in Bergemann et al. (2015). 

Our paper also dialogues with a recent literature on mechanism design and 
redistribution, most notably with Dworczak et al. (2021) and Akbarpour et al. 
(2020), who study the design of allocation mechanisms under redistributive concerns; 
and Pai and Strack (2022), who study the optimal taxation of a good with a negative 
externality when agents differ on their utility for the good, disutility for the externality 
and marginal value for money. A key difference in the results obtained in these papers 
and ours is that, in their settings, redistributive mechanisms are not pareto-efficient: 
redistribution implies some loss in social surplus. This is not the case in our paper, 


where optimal redistributive segmentations always maximize total surplus. 
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Finally, our paper dialogues with Dube and Misra (2022), who study experimen- 
tally the welfare implications of personalized pricing implemented through machine 
learning. The authors find a negative impact of personalized pricing on total con- 
sumer surplus, but note that a majority of consumers benefit from price reductions 
under personalization, pointing that under some inequality-averse weighted welfare 
functions, data-enabled price personalization might increase welfare. Their paper 
shows experimentally how the implementation of market segmentations aimed at max- 
imizing profits might generate, as a by-product, the redistribution of surplus among 
consumers. Our paper, on the other hand, shows theoretically how consumer-optimal 


redistributive segmentations might grant additional profits for the firm. 


1.2 Model 


A monopolist (he) sells a good to a continuum of mass one of buyers, each of whom 
can consume at most one unit. We normalize the marginal cost of production of the 
good to zero. The consumers privately observe their type v, which corresponds to 
their willingness to pay for the good. We assume that the consumers’ type can take 
a finite number K of possible values V = {v1,...,vK}, where 0 < v) < +--+ < Ug. 
We let K := {1,...,K}. A market p is a distribution over the valuations. We denote 


the set of all possible markets: 


M:=A(V) = {pe RS | SO ue = Land py > 0 for all ke Kh. 
keEK 


Price vu, is optimal for market pu € M if it maximizes the expected revenue of the 


monopolist when facing market py, that is: 


K K 
Up >. Hi > uy >> Mis VIEK. 
i=k i=j 


Let M; denote the set of markets where price vz is optimal. It is given by: 


K 
M, = € M | vg, € arg max 0; }, 
H | UiEV dM 


jai 


for any k € K. In the remaining of the paper we will hold an aggregate market fixed 
and denote it by n° € M. 
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Segmentation. The consumers’ types are perfectly observed by a social planner 
(she) who can segment consumers, that is, sort consumers into different sub-markets. 


The set of possible segmentations of an aggregate market ju? is given by: 
ni’) = {oe acmy| f  wo(dy) = n°} 
A(M) 


Formally, a segmentation is a probability distribution on M which averages to the 
ageregate market y°. The requirement that the different segments generated by 
a segmentation average to the aggregate market ensures that the segmentation 
simply sorts existing consumers into different groups, without fundamentally altering 
the aggregate composition of consumers in a market. This requirement is akin to 
the Bayes Plausibility condition that is typically used in the Bayesian Persuasion 
literature (Kamenica and Gentzkow, 2011). 

Given a segmentation a, the monopolist can price differently at each segment pu 
in the support of o. A pricing rule is a mapping p: M — V. As will become clear 
in problem 1.4, segments with more than one optimal price play a key role in our 


results. We focus on the following pricing rule: 


K 
p() = min 4 arg max vz > ree 
keEK = 
At each segment, the monopolist charges the smallest price among all optimal prices 
in that segment. This pricing rule makes the objective of the social planner (stated 
in equation (P)) upper semi-continuous and ensures the existence of an optimal 


segmentation!. 


Social objective. The social planner’s objective is to maximize a weighted sum of 


consumers’ surplus, with positive weights \ € R. Each dimension \; of the vector 
A corresponds to the marginal contribution to social welfare of consumers of type vx. 


The surplus of a consumer of type vz in market pu is given by: 


Ux(H) = max {0, vx — p(H)} - 


'Although technically important, this pricing rule does not impact our results qualitatively. 
Indeed, any joint distribution of consumers and prices that can be induced by the social planner 
under this pricing rule could be approximated arbitrarily well by a social planner facing a monopolist 
who selects among optimal prices in some other way. 


33 


The weighted consumer surplus on market ju is given by: 


Wp) = > Xx Ue Ur( 1), 
EK 


for any 1. € M. Hence, for any aggregate market ju°, the social planner’s objective is 


given by the following maximization program: 


Ht) Jac W(q) o(dy). (P) 
Given an aggregate market °, a segmentation o € U(y") is optimal if it solves (P). 
We focus on welfare weights that are decreasing on the consumer’s willingness to 
pay, such that A, > Ay for any k < k’ < K —1, and say that the social planner has 
redistributive preferences if the inequality holds strictly for some k, k’ € K. Under 
the assumption that consumers with lower willingness to pay are on average poorer 
than consumers with higher willingness to pay, this amounts to attributing a greater 


weight to surplus accruing to poorer consumers’. 


Efficiency. Every consumer has a value for the good that is strictly greater than 
the marginal cost of production. Hence, social surplus is maximized when every 
consumer buys the good. We say that a market p is efficient if every consumer can 
buy the good, that is, if the lowest optimal price for the seller at that market allows 
everyone to consume: p(j/) = min supp(j). For a given market ys and Pareto weights 


A, the maximum feasible social surplus is thus given by 


s(t) = > Aeeve- 


kek 


Note that a segmentation of ju achieves s(j1) if and only if it is efficient. A segmentation 


od is efficient if it is only supported on efficient markets. 


Informational Rents. The profit of the monopolist at market ju is given by: 


m(n) = p(t) do) pe 


REC p(n) 


2We follow here the approach by Dworczak et al. (2021). 
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where C, = {k € K|vz > p}. The profit of the monopolist under segmentation o is 
given by: 


H(o) = A 4 ruata 


Segmenting the aggregate market can only weakly increase the expected profit of the 
monopolist relative to no segmentation. Therefore, we always have II(o) > m(°) for 
any 0 € U(p°). We say that some segmentation o grants a rent to the monopolist 


whenever II(a) > 7(p°). 


Uniformly Weighted Consumer-Optimal Segmentations. If Ay = Ax > 0 
for all k, k’ € K, program (P) corresponds to the maximization of the total consumer 
surplus over all possible segmentations. A segmentation that solves this optimization 
problem is named uniformly weighted consumer-optimal. As shown in Bergemann 
et al. (2015), uniformly weighted consumer-optimal segmentations are (i) efficient— 
and hence achieve the maximum feasible social surplus—, and (ii) do not grant the 
monopolist any rent. For an interior aggregate market p°, there exists infinitely 
many uniformly weighted consumer-optimal segmentations. In section 1.4.3, we 
characterize the set of aggregate markets for which consumer-optimal segmentations 
with redistributive preferences are also uniformly weighted consumer-optimal, thus 


providing a natural way to select among these segmentations for such markets. 


1.2.1 Discussion of the Model 


Information Provision as Segmentation. In digital markets, information pro- 
vision about consumers often occurs through the assignment of labels to different 
consumers. Indeed, one could think of a model in which the social planner adopts a 
signal structure 0: V + A(L), where L is a set of labels. The meaning of each label 
is then pinned down by the social planner’s strategy, and the monopolist optimally 
chooses different prices for consumers with different labels. 

Such a model is equivalent to ours. Indeed, any segmentation o € U(yu°) can be 
implemented by some signal structure @, and any signal structure ¢ implements some 
segmentation o € U(u°). The approach of working directly in the space of feasible 
distributions over markets rather than in the space of labeling strategies is standard 


in the information design literature (Kamenica and Gentzkow, 2011). 
Continuum of Consumers. While we consider a setting with a continuum of 


consumers, our model is equivalent to one in which there is a discrete number 


of consumers, with types independently distributed according to p°. Under this 
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interpretation, the social planner commits ex-ante to an information structure a to 
inform the monopolist, which defines the distribution of posterior beliefs yw that the 


monopolist will form upon facing each consumer. 


1.3. Three-Value Case 


In this section, we illustrate our model and some of the results from the following 


sections in the simple three-value case. 


Setup. Let’s consider three types, vj = 1, ve = 2 and v3 = 3. We can conveniently 
depict the set of markets M as the two-dimensional unit simplex (see Mas-Colell 
et al., 1995, p.169). It is depicted in figure 1.1, where each vertex of the simplex 
represents a degenerate market on a value v € V, denoted by the Dirac measure dy. 

In the left panel of figure 1.1 are drawn the three different price regions M,, M. 
and M3. The points in each of the regions correspond to the markets for which each 
of the different prices {1, 2,3} are optimal for the monopolist®. The border between 
two adjacent regions represents markets for which there are more than one optimal 
price. Given pricing rule p, the price charged in such markets is the lowest amongst 
the optimal. 

In the right panel, an aggregate market u° = (0.3, 0.4, 0.3) is represented, which 
is in the interior of the region M2, meaning that v2 is a strictly optimal price for 
u°. Two possible segmentations are depicted: the one in green dashed lines, that 
segments ju° into the three degenerate markets (thus implementing first-degree price 
discrimination); and the one in black dotted lines, that segments p° into three 


" oar 
, containing 


segments: py’, containing types all three types and being priced v1; ju 
only types v2 and v3 and being priced v2; and y”, containing all three types and 
being priced v3. 

Any splitting of y° into a set of points S C M represents a feasible segmentation, 
as long as p° € co(9)*. A segmentation is optimal given weights (\;, A2, \3), with 
Ay => A2 > A3, if it maximizes the sum of weighted consumer surplus over all segments 
generated. Note that consumers of type v, never get any consumer surplus (since the 
monopolist never charges a price lower than their willingness to pay), such that the 
optimal segmentation trades-off surplus obtained by types v2 and v3. We will focus, 
without loss of generality, on direct segmentations, i.e. segmentations in which there 


is not more than one segment with a given price. 


3Formally, for any k, Mj, = cl(p~'(vz)), where cl(S) denotes the topological closure of a generic 
set S. 
4For any set S, co(S') denotes the convex hull of $ 
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Figure 1.1: The Simplex representing M and two feasible segmentations. 


General Properties of Optimal Segmentations. A first step for finding the 
optimal segmentation of y° is to observe that any optimal segmentation must be 
efficient. To see that, consider the black dotted segmentation in the right panel of 
figure 1.1. Both yw’ and py” are efficient, since all the consumers in these segments 


are able to buy the good. The remaining segment i” 


, however, is not efficient, as it 
contains some consumers with type v, and v2 who are not able to consume under 
that segment’s price. One could solve that by re-segmenting pi” in the following way: 
creating a segment ju;’ containing all of the types v; and v2 and some of the types v3 
that used to belong to yw”, and another segment 63 containing only the remaining 
types v3. Note that the amount of type v3 in 4’ can be adjusted to ensure that 
this segment will have price v;. That way, both of the resulting segments will be 
efficient. Furthermore, this re-segmentation of p’/” unambiguously increases consumer 
welfare, since it has no impact on the welfare of consumers in y’ and py” and (weakly) 
increases the surplus of every consumer previously belonging to py”. 

Indeed, a welfare-increasing segmentation can be performed to any inefficient 
market. This narrows down the search for an optimal segmentation, as we know that 
it must be supported only on efficient segments. The left panel of figure 1.2 depicts, 
in orange, the efficient markets. These are: the degenerate market 63; the set of 
markets in region M, that have no consumer with value 1; and the entire region Mj. 

We can further note that, in an optimal segmentation, the segment with price v; 
must not belong to the interior of region M;. To see that, consider the right panel of 
figure 1.2. In it are depicted two segmentations: o,, which splits y° into pg and p’, 
and oy, which splits y° into pp and p’. Segmentation oy is always preferred over oq 
for two reasons. First, Wp» has a higher share of types v2 and v3 than fig. Since these 
are the only two types that are extracting surplus on the segment whose price is v1, 


having a higher share of them increases the social planner’s objective. Second, pp is 
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Figure 1.2: Efficient Markets and Segmentations. 


“closer” to °, which means that 04(5) > (fa). That means that segmentation o; 
is able to include a bigger mass of consumers in the segment where they will extract 
the largest surplus, thus also increasing the social planner’s objective. 

The argument outlined above illustrates how every segmentation generating a 
segment on the interior of region M/, must be dominated by some segmentation 
that instead generates a segment on the boundary of regions M, and M2. This 
amounts to saying that any optimal segmentation must include a segment in which 
the monopolist is indifferent between charging price v; or charging some other price. 
The intuition for that is simple: if the monopolist strictly prefers to charge price v; 
in that segment, then there’s still room for “fitting” other types in that segment in a 


Pareto improving way. 


Uniformly Weighted Consumer-Optimal Segmentations. We begin by con- 
sidering the case where Ay = A2 = A3. The left panel of figure 1.3 depicts three 
different segmentations, 04, 0) and o-, each of them generating one segment with 
price v, and one segment with price v2. All of these three segmentations are uniformly 
weighted consumer-optimal. This follows from the fact that i) they maximize total 
(consumer + producer) surplus, since they are all efficient, and ii) the monopolist 
does not get any of the surplus that is created from the segmentation °. 

Indeed, there are uncountably many uniformly weighted consumer-optimal seg- 
mentations of 1°. All of these are equivalent in that they maximize total consumer 


surplus, but they are not equivalent in how they distribute such surplus across 


°One way of seeing this is as follows: A decision-maker strictly benefits from observing a piece 
of information if, as a result of this observation, she is able to make better decisions than she would 
have made absent this information. In our setting, this amounts to the monopolist being able to, as 
a result of the segmentation, choose different prices than the uniform price, at markets in which 
these different prices are strictly preferred over the uniform price. Since price v2 belongs to the set 
of optimal prices in every segment generated by the segmentations in figure 1.3, the monopolist 
does not strictly benefit from them. 
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Figure 1.3: Uniformly Weighted Consumer-Optimal Segmentations. 


consumers. This can be seen in the right panel of figure 1.3: while the three segmen- 
tations of the left panel induce the same profit for the monopolist and the same total 
consumer surplus, 0, induces greater surplus for consumers of type v2 than the other 
segmentations. This is so because, among the segments priced at v1, [1- is the one 


that includes the most consumers of type v2, who can then benefit from a low price. 


Consumer-Optimal Segmentations under Redistributive Preferences. Let’s 
now consider the case when Ay > A3. Among the segmentations depicted in the 
left panel of figure 1.3, segmentation o, is now preferred over o, and oy. But is it 
optimal? One way of increasing the surplus of consumers of type v2 further is to 
exchange consumers between the two segments generated by o,: by exchanging the 
remaining consumers of type v3 that are present in ° against some of the consumers 
of type v2 present in py“, one can increase the amount of types v2 that pay a low price. 
While this exchange increases the surplus of types v2, it dramatically decreases the 
surplus of types v3, since now there are sufficiently many of them in segment pi’ for 
the monopolist to want to increase the price charged at that segment. This would 
lead to a segmentation that is no longer uniformly weighted consumer-optimal: the 
price increase in segment j1/“ would cause some of the surplus that was previously 
captured by consumers of type v3 to now be granted to the monopolist instead. The 
result below establishes when this exchange is desirable from the social planner’s 


perspective. 
Result 1. Let u° = (0.3,0.4,0.3). Then, the two following assertions are satisfied: 


(i) If the inequality 
2 2 U3 + Vg — Vy 


A3 V2 — Vy 
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Figure 1.4: Optimal Segmentations with Redistributive Preferences. 


is satisfied, then the consumer-optimal segmentation under redistributive prefer- 
ences is also uniformly weighted consumer-optimal and generates two segments. 
One supported on {v1,V2,v3} and the other one supported on {v2,v3}. This 
segmentation is represented in the left panel of figure 1.4; 


(ii) If the inequality 
AQ is U3 + V2 — Vy 


ABZ Vg — V4 
is satisfied, then the consumer-optimal segmentation under redistributive prefer- 
ences 1s not uniformly weighted consumer-optimal and generates three segments. 
The first one is supported on {v1, v2}, the second is supported on {v2, v3}, and 
the third is supported on {v3}. This segmentation is represented in the right 


panel of figure 1.4. 


An important consequence of this result is that if the social planner’s preferences 
are sufficiently redistributive, meaning that A» is sufficiently greater than A3, the 
optimal segmentation might give a rent (i.e. an additional profit) to the monopolist. 
By packing more consumers with lower types together, the social planner also 
makes higher types more distinguishable, thus allowing the monopolist to raise their 
prices. The above example illustrates the main argument of the paper: while market 
segmentation can redistribute surplus without any loss of efficiency, sometimes raising 
the surplus of poorer consumers can only be done if some of the surplus from richer 
consumers is granted to the monopolist. 

However, not every aggregate market requires the granting of rents to the monop- 
olist in order to satisfy redistributive objectives. Consider for instance the aggregate 
market p° = (0.2, 0.65, 0.15), represented in the left panel of figure 1.5. The optimal 


segmentation of this market given any preferences Az > A3 is the one depicted in the 
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figure: it always generates a segment with {v1, v2} and another one with {v2, v3}, and 
this segmentation is always uniformly weighted consumer-optimal. On this aggregate 
market, satisfying a redistributive objective never requires granting rents to the 
monopolist because it contains sufficiently many consumers of type v2, such that even 
after pooling as many as possible of them with types v; in segment jz, there are still 
sufficiently many types v2 left to ensure that types v3 will not be over-represented in 
segment ju’. 

The result below characterizes the set of aggregate markets that, under a suffi- 
ciently strong redistributive motive, would require granting rents to the monopolist. 


We denote this set as the rent region. 


Result 2. The rent region is give by 


at (col {6s, 1°, 12, 1°})). 


This result is illustrated in the right panel of figure 1.5, where the rent region is 
depicted in orange. Equivalently, the complement of this set denotes the aggregate 
markets for which any redistributive objective can be met without granting rents 
to the monopolist — that is, while maximizing total consumer surplus—. We call 


this set the no-rent region. The following section generalizes the insights presented 
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Figure 1.5: Rent Region. 


through this example. Section 1.4.1 generalizes the fact that optimal segmentations 
are efficient and include discount segments supported at markets at which the 
monopolist is indifferent between more than one price, while section 1.4.2 establishes 
properties of optimal segmentations when the redistributive motive is sufficiently 
strong and shows how to construct optimal segmentations in this case. Finally, 
section 1.4.3 characterizes generally the no-rent and rent regions and shows that 


optimal segmentations for markets belonging to the no-rent region exhibit a very 


Al 


simple form, with only one discount segment and one uniform price segment. 


1.4 Optimal Segmentations 


We now turn to the analysis of the general case. In section 1.4.1 we derive general 
properties of optimal segmentations — that is, characteristics that are present in 
optimal segmentations given any decreasing welfare weights A. Section 1.4.2 then 
constructs optimal segmentations under strongly redistributive preferences: when 
the weight assigned to lower types is sufficiently larger than the weight assigned 
to higher types. Finally, we characterizes the set of aggregate markets for which 
satisfying a redistributive objective might require granting additional profits to the 


monopolist in section 1.4.3. 


1.4.1 General Properties 


Efficient segmentations. Our first result echoes our analysis of efficiency in the 
three-value case and establishes that i) we can always restrict ourselves to efficient 
segmentations—as long as the weights are non-negative; ii) if the weights are all 
strictly positive (i.e. if Ax > 0 under our assumption of decreasing weights), only 


efficient segmentations can be optimal. 


Proposition 1. For any aggregate market p° and any weights \ € RS (not necessarily 


decreasing), there exists an efficient optimal segmentation of u°. Furthermore, if 


every weight is strictly positive, then any optimal segmentation is efficient. 


Proof. This result is a direct consequence of Proposition 1 in Haghpanah and Siegel 
(2022b)—which itself follows from the proof of Theorem 1 in Bergemann et al. 
(2015). 


This result relies on the fact that any inefficient market can be segmented in a 
Pareto improving manner, that is, in a way that weakly increases the surplus of all 
consumers. Hence, as long as the social planner does not assign a negative weight to 
any consumer, there must be an efficient optimal segmentation. Proposition 1 thus 
implies that segmenting in a redistributive manner never comes at the expense of 


efficiency. 
Direct segmentations. A segmentation o is direct if all segments in o have 


different prices, that is, if for any pw, uw’ € supp(c), p(w) # p(w’). Our next lemma 


shows that it is without loss of generality to focus on direct segmentations. 
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Lemma 1. For any aggregate market ° and any segmentation o € (yu), there 


exists a direct segmentation a! € X(u°) such that, 


W (nu) o(dy) = W (nu) 0" (dy). 
A(M) A(M) 


Proof. See Appendix. 


We further show that there always exists an optimal and direct segmentation 
that is only supported on the boundaries of price regions {Mz }nex. Let K° = {k € 
K |v, € supp(u°)} be the set of indices of consumers’ types supported by p°. 


Lemma 2. For any aggregate market 1° that is not efficient, there exists an optimal 


direct segmentation supported on boundaries of sets {Mi }rexo. 


Proof. See Appendix. 


This result implies that we can restrict without loss of generality to finitely 


supported segmentations. 


1.4.2 Strongly Redistributive Social Preferences 


In this section, we derive some characteristics of the optimal segmentation when the 
social planner’s preferences are strongly redistributive, that is, when the weights A 


are strongly decreasing on the type v. 


Definition 1. The weights \ are K-strongly redistributive if, for anyk < k' < K—-1, 


Ab 
Xp ea Me 


That is, a social planner exhibits «-strongly redistributive preferences (k-SRP) if 
the weight she assigns to a consumer of type vz is at least « times larger than the 
weight she assigns to any consumer of type greater than vz. 


Let us define the dominance ordering between any two sets. 


Definition 2. Let X,Y CR. The set X dominates Y, denoted X >p Y, if for any 
xeEX andanyyEeY,r>y® 


We can now state the main result of this section. 


Proposition 2. For any aggregate market 1° in the interior of M, there exists t 
such that if X’s are K-strongly redistributive, then for any optimal direct segmentation 
o € Y(u°) and any markets py, py’ € supp(c), uw 4 w’: either supp(14) >p supp(p’) or 


supp(“’) 2p supp(y). 
°Note that this definition of dominance is stronger than the strong set order in Topkis (1998). 
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Figure 1.6: Structure of optimal segmentations under strong redistributive prefer- 
ences. 


Proof. See Appendix. 


The result stated above establishes that, when the social planner’s preferences 
exhibit a sufficiently strong taste for redistribution, optimal segmentations divide 
the type space V into contiguous overlapping intervals, with the overlap between 
any two segments being composed of at most one type. The following corollary is a 


direct consequence of proposition 2: 


Corollary 1. For any aggregate market ° in the interior of M, there exists K such 
that if X’s are K-strongly redistributive, then for any optimal direct segmentation 
o € N(u°), any market w € supp(o) and any k such that minfsupp()} < vz < 
max{supp(}/)}: o(H) Ma = Mp. 


The above result states that any segment y belonging to a segmentation that 
is optimal under strong redistributive preferences contains all of the consumers 
with types strictly in-between min{supp(j)} and max{supp(j4)}. Together with 
proposition 2, it implies that, under k-SRP optimal segmentations, every consumer 
type v will belong to at most two segments: either it will belong to the interior 
of the support of a segment jy, such that all consumers of this type have surplus 
v — min(supp(j)), or it will be the boundary type between two segments p and 
pw’, such that a fraction of these consumers (those belonging to segment ju) gets 
surplus v — min(supp(j)) and the rest gets no surplus. The structure of optimal 
segmentations under strong redistributive preferences is illustrated in figure 1.6. 

These results, along with proposition 1, completely pin down the k-SRP optimal 
direct segmentation. One can construct it by employing the following procedure, 


presented as follows through steps: 


e Step i) Start by creating a segment — call it 4, — with all consumers of type 


Ul. 


e Step ii) Proceed to including in jig, successively, all consumers of type v2, then 
all of the types v3, and so on. From proposition 1 we know that 4, must 
be efficient, meaning that we must have p(ji,) = v;. As such, the process of 


inclusion of types higher than v; must be halted at the point in which adding 
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a new consumer in fq would result in v; no longer being an optimal price in 
this segment. We denote as v(q\,) the type that was being included when the 


process was halted. 


e Step iii) Create a new segment — call it 4, — with all of the remaining types 


Y(alB): 


e Step iv) Proceed to including in jij, successively, all of consumers of type 
Ucajb)+1, then all of the types v(qjx)+2, and so on. Halt this process at the point 
in which adding a new consumer in p) would result in v(qjp) no longer being 
an optimal price in this segment. We denote as vj.) the type that was being 


included when the process was halted. 


e Step v) Create a new segment with all of the remaining types vij.). Repeat 
the process described in the last steps until every consumer has been allocated 


to a segment. 


1.4.3. Optimal Segmentations and Informational Rents 


This section explores the question of when does an optimal segmentation maximize 
total consumer surplus or, conversely, when it grants a rent for the monopolist. 

Say that an aggregate market yz belongs to the rent region if there exists some Kk 
such that if the social planner has «-strongly redistributive preferences, the optimal 
segmentation grants a rent to the monopolist. Conversely, denote no-rent region 
the set of aggregate markets for which any optimal segmentation with redistributive 
preferences also maximizes total consumer surplus. 

Before we characterize the rent and no-rent regions, we define a particular 


segmentation, which we will call 0%”: 


Definition 3. Let y° be an aggregate market with uniform price v,. Call oX® the 


segmentation that splits 1° into two segments yu and ww", such that: 


0 0 
w= (2,2... 18.0,-..00), 


0) 0 
p= (0.0... jy Meno 53, au yi 


acid er, a es 


where pi, = 01/Vu, Mi, = (We — oH) /(L— 0) and o = (vu DTT WA) / (Yu — 21): 


NR 


Segmentation o*’”* is very simple and generates only two segments: one pooling 


all the consumers who would not buy the good on the unsegmented market (those 
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Figure 1.7: Segmentation a%*. 


with type lower than v,,) and another one pooling all the consumers who would 
buy the good on the unsegmented market (those with type higher than v,). Under 
segmentation o%”, the only consumer type that gets assigned to two different 


segments is Uy. 


Proposition 3. An aggregate market p° belongs to the no-rent region if and only if 


oN® is an efficient segmentation of 1°. 


Proof. See Appendix. 


Proposition 3 establishes a simple criterion that defines whether an aggregate 


ae P(L*) = VU 
and p(y") = v,. Whenever this is not true, the aggregate market belongs to the rent 


market belongs to the no-rent region: it suffices to check if, under a 


region. 


Corollary 2. Consider an aggregate market p°. If oN® is not an efficient seg- 
mentation of u°, then there exists & such that, if welfare weights \ are K-strongly 


redistributive, any optimal segmentation grants a rent to the monopolist. 


The intuition for the results above is as follows. A market belongs to the no-rent 
region if, given any redistributive preferences, its optimal segmentation maximizes 
total consumer surplus. On one hand, we know from proposition 2 that, under 
strong redistributive preferences, optimal segmentations divide the type space into 
overlapping intervals, with the overlap between two segments being comprised of at 
most one type. On the other hand, we have as a necessary and sufficient condition 
for total consumer surplus to be maximized that the segmentation is i) efficient 
and ii) the uniform price v, is an optimal price at every segment generated by this 
segmentation. Condition i) ensures that total surplus is maximized, while condition 
ii) ensures that producer surplus is kept at it’s uniform price level, meaning that all 
of the surplus created by the segmentation goes to consumers. Since condition ii) 
can only be satisfied if type vu, belongs in the support of all segments, we get that 
the conditions for optimality under strong redistributive preferences and for total 
consumer surplus to be maximized can only be simultaneously met by a segmentation 
that only generates two segments, with the overlap in the support of both segments 


being comprised of v,,. 
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Such a segmentation indeed maximizes total consumer surplus if it is efficient 
and if v, is an optimal price in both segments. This is the case if v, and v, are both 
optimal optimal prices on the lower segment, and if v, is an optimal price in the 


NR is the only segmentation that can potentially 


upper segment. Segmentation o 
satisfy all of these conditions at once, as it includes in the lower segment the exact 
proportion of types v, that would make the monopolist indifferent between charging 


NR 


a price of v, or v,. As such, segmentation 0’ maximizes total consumer surplus if 


and only if it is efficient. 


Corollary 3. If an aggregate market 1° belongs to the no-rent region, then aN® is 


its only direct consumer-optimal segmentation under any redistributive preferences. 


This result establishes that, for markets in the no-rent region, optimal segmen- 
tations have an extremely simple structure: they only generate a discount segment 
with price v,, pooling all the types who would not consume under the uniform price 
and some of the types v,, and a residual segment with price v,, containing all of the 
remaining consumers. Furthermore, this segmentation must be optimal under any 
decreasing welfare weights 4. As such, this result selects for the markets belonging 
to the no-rent region one among the many uniformly weighted consumer-optimal 
segmentations that were outlined in Bergemann et al. (2015). 

Due to the structure of segmentation 0”, all of the surplus that is generated 
by the segmentation is given to consumers with types below or equal to v,, all of 
which get the maximum surplus they could potentially get. Since it is impossible 
to raise the surplus of any type below v,, and impossible to raise the surplus of 
types above v, without redistributing from lower to higher types, this segmentation 
must be optimal whenever the weights assigned to different consumers are (weakly) 
decreasing on the type. 

The results in this section establish that there are essentially two types of markets: 
those for which redistribution can be done only within consumers, while keeping 
total consumer surplus maximal, and those for which increasing the surplus of lower 
types past a certain point necessarily decreases the total pie of surplus accruing to 


consumers and grants additional profits to the monopolist. 


AT 
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Chapter 2 


Persuading a Wishful Thinker 


Abstract 


We analyze a model of persuasion in which Receiver forms wishful non- 
Bayesian beliefs. The effectiveness of persuasion depends on Receiver’s 
material stakes: it is more effective when intended to encourage risky 
behavior that potentially leads to a high payoff and less effective when 
intended to encourage more cautious behavior. We illustrate this insight 
with applications showing why informational interventions are often in- 
effective in inducing greater investment in preventive health treatments, 
how financial advisors might take advantage of their clients overoptimistic 
beliefs and why strategic information disclosure to voters with different 


partisan preferences can lead to belief polarization in an electorate. 


JEL classification codes: D82; D83; D91. 
Keywords: non-Bayesian persuasion; motivated thinking; overoptimism; optimal 


beliefs. 
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2.1 Introduction 


It is generally assumed in models of strategic communication that receivers update 
beliefs in a perfectly rational manner, as would a Bayesian statistician. Yet, a 
substantial literature in psychology and behavioral economics shows that the process 
by which individuals interpret information and form beliefs is not guided solely by a 
desire for accuracy but often depends on their motivations and material incentives. 
This phenomenon is generally referred to as motivated inference (Kunda, 1987, 1990), 
and a common manifestation of it is wishful thinking: the tendency of individuals 
to let their preferences about outcomes influence the way they process information, 
leading to beliefs that are systematically biased towards outcomes they wish to be 
true.! In this paper we investigate how wishful thinking affects the effectiveness of 
persuasion, i.e., the probability or frequency with which a sender is able to induce a 
receiver to take her preferred action. 

Following Caplin and Leahy (2019), we propose a model in which the receiver’s 
belief updating rule is non-bayesian: after observing an informative signal, Receiver 
forms beliefs by trading off their anticipatory value against the psychological cost 
of distorting beliefs away from Bayesian ones. As a result, Receiver’s beliefs are 
stakes-dependent, i.e., they depend on his preferences, and overweight the state 
associated with the highest payoff, giving rise to overoptimism. 

Distortions in beliefs lead to distortions in Receiver’s behavior: some actions end 
up being favored, meaning that they are taken more often (i.e., after the reception of 
a strictly greater set of possible signals) relative to a Bayesian decision-maker. When 
he only has two available actions, wishful thinking leads Receiver to favor the action 
associated with the highest payoff and the highest payoff variability. If one of the two 
actions induces the highest possible payoff and the other induces the highest payoff 
variability, then which of the two is favored depends on the magnitude of Receiver’s 
belief distortion cost. As such, the effectiveness of information provision as a tool to 
incentivize agents might vary with individuals’ material stakes: persuasion is more 
effective when it is aimed at encouraging behavior that is risky but can potentially yield 
very high returns and less effective when it is aimed at encouraging more cautious 
behavior. We illustrate this insight in applications in which wishful beliefs can play 


an important role. 


'There exists abundant experimental evidence of wishful thinking. See in particular Bénabou 
and Tirole (2016), page 150 and Benjamin (2019) Section 9, as well as, e.g., Weinstein (1980), 
Mijovic-Prelec and Prelec (2010), Mayraz (2011), Heger and Papageorge (2018), Coutts (2019), 
Engelmann et al. (2019) or Jiao (2020). 
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Application 1: Information Provision and Preventive Health Care. In 
this application a public health agency designs an information policy about the risk 
of infection of an illness in order to promote a preventive treatment that can be 
adopted by individuals at some cost. Since not adopting the treatment is the action 
that can potentially yield the highest payoff (in case the illness is not severe) and 
also the action with the highest payoff variability, it is favored by wishful receivers. 
As such, information campaigns aimed at promoting preventive behavior are less 
effective. We also show how the effectiveness of information campaigns are impacted 
by the severity of the disease and the effectiveness of the treatment. 

This application sheds light on the stylized fact that individuals are consistently 
investing too little in preventive health care treatments, even if offered at low prices 
(especially in developing countries, see Dupas, 2011; Chandra et al., 2019; Kremer 
et al., 2019, Section 3.1) and that informational interventions are often ineffective in 
inducing more investment in preventive health care devices (see, in particular, Dupas, 
2011, Section 4, and Kremer et al., 2019, Section 3.3). Recent literature conjectures 
that individuals might not be responsive to such information campaigns because 
they prefer to hold optimistic prospects about their health risks (see Schwardmann, 


2019 and Kremer et al., 2019, Section 3.3).? Our model formalizes this argument. 


Application 2: Persuading a Wishful Investor. In this application, we consider 
the interaction between a financial broker and her potential client. The broker designs 
reports about the (continuously distributed) return of some risky financial product 
to persuade the client to buy the asset. We show that a financial broker interested in 
selling a risky product is always more effective when persuading a wishful investor. 

This application formalizes why some professional financial advisors might some- 
times not act in the best interest of their clients by making investment recommen- 
dations that take advantage of their biases and mistaken beliefs (see, for instance, 
Mullainathan et al., 2012 or Beshears et al., 2018, Section 9) as well as why some 
consulting firms seem to specialize in advice misconduct and cater to biased con- 
sumers (Egan et al., 2019). It also helps explaining why the online betting industry 
puts so much effort into persuasion. Indeed, Babad and Katz (1991) document 
that individuals generally display wishful thinking when they take part in lotteries: 
they prefer to think they will win and are therefore more receptive to information 


encouraging risky bets. 


?There exists compelling experimental evidence that such self-deception exists in the medical 
testing context (Lerman et al., 1998; Oster et al., 2013; Ganguly and Tasoff, 2017). 
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Application 3: Public Persuasion and Political Polarization. Belief polar- 
ization along partisan lines is a pervasive and much debated feature of contemporary 
societies. Although such polarization can be partly caused by differential access to 
information, evidence suggests that it is exacerbated by the fact that individuals 
tend to make motivated inferences about the same piece of information (Babad, 
1995; Thaler, 2020). 

In this application we explore the relationship between optimal information 
disclosure to wishful citizens and belief polarization. Following Alonso and Camara 
(2016), we model a majority voting setting in which an electorate, differentiated in 
terms of partisan preferences, uses information disclosed by a politician to vote on a 
proposal. Wishful thinking leads voters with different preferences to adopt different 
beliefs after being exposed to a public signal: those voting against or for the proposal 
distort their beliefs in opposite directions, giving rise to polarization. Sender’s 
optimal public experiment consists in persuading the median voter, which maximizes 
the number of voters distorting beliefs in opposite directions. We show that if 
partisan preferences are symmetrically distributed around the median, then Sender’s 
optimal information policy generates maximal belief polarization in the electorate as 
a byproduct. This adds nuance to the argument that motivated thinking is one of 
the drivers of polarization: not only can motivated thinking lead to polarization, but 
the strategic disclosure of information to a motivated electorate can also accentuate 


this tendency?. 


2.1.1 Related literature 


The persuasion and information design literature’ has initially focused on the problem 
of influencing rational Bayesian decision-makers as in the seminal contributions of 
Kamenica and Gentzkow (2011) and Bergemann and Morris (2016). By introducing 
non-Bayesian updating in the form of motivated beliefs formation, we contribute to 


the literature studying persuasion of receivers subject to mistakes in probabilistic 


3This application is related to the paper by Le Yaouanq (2021) who constructs a model of large 
elections with motivated voters. As in our model, the formation of motivated beliefs by citizens 
leads voters with different preferences to hold different beliefs after observing the same information. 
We find, as he does, that greater heterogeneity in partisan preferences increases belief polarization 
but has no effect on the policy implemented in equilibrium. This is, however, the consequence of a 
different modelling assumption. Namely, that information is endogenously designed to persuade the 
median voter, whose vote is not distorted relative to a Bayesian voter. 

4See Bergemann and Morris (2019) and Kamenica (2019) for reviews of this literature. 
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inferences.°° Levy et al. (2018) analyze a Bayesian persuasion problem where a sender 
can send multiple signals to a receiver subject to correlation neglect. Benjamin et al. 
(2019) provide an example of persuasion game where Receiver exhibits base-rate 
neglect when updating beliefs. In de Clippel and Zhang (2020) the receiver holds 
subjective beliefs which belong to a broader class of distorted Bayesian posteriors. 
In contrast, in our model, Receiver’s belief formation process optimally trades-off the 
benefits and costs associated with maintaining non-Bayesian beliefs as in the work of 
Caplin and Leahy (2019). 

On the one hand, we assume that Receiver’s value from maintaining inaccurate 
beliefs comes from the anticipation of the payoff he will achieve in equilibrium. 
Intuitively, it represents the idea that individuals might derive utility from the 
anticipation of future outcomes, be them good or bad. This hypothesis has been 
widely used in the literature to study how anticipatory emotions affect physical 
choices (see, e.g., Loewenstein, 1987; Caplin and Leahy, 2001) as well as choices 
of beliefs (Bénabou and Tirole, 2002; Brunnermeier and Parker, 2005; Bracha and 
Brown, 2012; Caplin and Leahy, 2019). Receiver’s choice of beliefs is thus a way 
of satisfying his psychological need to be optimistic about the best-case outcomes 
or, on the contrary, to avoid the dread and anxiety associated with the worst-case 
outcomes. This hypothesis is supported experimentally by Engelmann et al. (2019), 
who find significant evidence that wishful thinking is caused by the desire to reduce 
anxiety associated with anticipating bad events. It is important to note that while 
anticipatory utility may be a strong motive for manipulating one’s beliefs, it is not 
the only possible one. This differentiates wishful thinking from the more general 
concept of motivated reasoning, which is usually defined as the degree to which 
individuals’ cognition is affected by their motivations.’ Different motivations from 
anticipated payoffs have been explored in the literature such as cognitive dissonance 
avoidance (Akerlof and Dickens, 1982; Golman et al., 2016), preference to believe 
in a “Just World” (Bénabou and Tirole, 2006), maintaining high motivation when 


individuals are aware of being subject to a form of time-inconsistency (Bénabou and 


°See Benjamin (2019) for a review of the literature. In particular, wishful thinking belongs to 
preference-biased inferences reviewed in Benjamin (2019), Section 9. 

®Tt is interesting to note that an active literature also explores how errors in strategic reasoning 
(Eyster, 2019) affect equilibrium outcomes in strategic communication games. Although in our 
model Receiver understands all the strategic issues, we believe, nevertheless, that it is important 
to mention that players’ misunderstanding of their strategic environment might also lead them to 
make errors in statistical inference even if they update beliefs via Bayes’ rule, as in Mullainathan 
et al. (2008), Ettinger and Jehiel (2010), Hagenbach and Koessler (2020) and Eliaz et al. (2021b,a) 
who consider communication games where players make inferential errors because of a coarse 
understanding of their environment. 

7See Krizan and Windschitl (2009) for a more detailed discussion on the differences between 
wishful thinking and motivated reasoning. 
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Tirole, 2002, 2004) or satisfying the need to belong to a particular identity (Bénabou 
and Tirole, 2011). 

On the other hand, we assume distorting beliefs away from the Bayesian bench- 
mark is subject to some psychological cost. This assumption reflects the idea that, 
under a motivated cognition process (Kunda, 1987, 1990), individuals may use sophis- 
ticated mental strategies such as manipulating their own memory (Bénabou, 2015; 
Bénabou and Tirole, 2016)®, avoiding freely available information (Golman et al., 
2017) or creating elaborate narratives supporting their bad choices or inaccurate 
claims to justify their preferred beliefs.? Our assumptions on the cost function 
captures, in “reduced form”, the fact that implementing such mental strategies comes 
at a cost when desired beliefs deviate from from the Bayesian rational ones. In 
contrast, Brunnermeier and Parker (2005) model the cost of erroneous beliefs as the 
instrumental loss associated with the inaccurate choices induced by such beliefs. It 
is worth noting that Coutts (2019) provides experimental evidence in favor of the 


psychological rather than instrumental costs associated with belief distortion. 


2.2. Model 


States and prior belief. A state of the world @ is drawn by Nature from a state 
space © according to a prior distribution jug € int(A(O)).!° Receiver (he) and Sender 


(she) do not observe the state ex-ante but its prior distribution is common knowledge. 


Actions and payoffs. Receiver chooses an action a from a compact space A with 
at least two actions. His material payoff is given by u(a,6).'! Receiver’s choice affects 
Sender’s payoff, which is given by v(a). Before Receiver takes his action, Sender can 
commit to any signal structure (0, S$) given by an endogenously chosen set of signal 
realizations S and a stochastic mapping 7: 0 — A(S) associating any realized state 


6 to a conditional distribution o(@) over S. 


8For experimental evidence on memory manipulation see, e.g., Saucet and Villeval (2019), 
Carlson et al. (2020) and Chew et al. (2020). 

°One can relate this possible microfoundation of the belief distortion cost to the literature on 
lying costs (Abeler et al., 2014, 2019) since, when Receiver is distorting away his subjective belief 
from the rational Bayesian beliefs, he is essentially lying to himself. We thank Emeric Henry for 
suggesting us this interpretation of the cost function. 

10Tn what follows, for any nonempty Polish space X, we denote A(X) the set of Borel probability 
measures over the measure space (X,5(X)). We always endow A(X) with the weak*-topology. If 
the support of a measure yu € A(X) is finite we adopt the shorthand notation p({x}) = u(x) for 
any x € supp(/u). 

We assume the map u(a,-): © > R to be Borel measurable, continuous and bounded for any 
acA. 
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Receiver’s behavior. For any belief 7 € A(Q), Receiver’s optimal action corre- 


spondance is given by 


A(n) = arg max f u(a, 0) n(dé). 
acA (6) 

Without loss of generality, we assume that no action is dominated, i.e., for any action 

a € A there always exists some belief 7 such that a € A(7). When the set A(7) has 

more than one element we break the tie in favor of Sender. That is, for any belief 

n, the action played by Receiver in equilibrium is given by a selection a(7) € A(7) 


which maximizes Sender’s expected payoff.” 


Receiver’s beliefs. After observing any signal realization s € S, a Bayesian 


decision-maker’s belief is given by 


for any Borel set 6) CO. 

In contrast, we assume that, when forming beliefs, Receiver trades-off the psy- 
chological benefit against the psychological cost of holding possibly non-Bayesian 
beliefs. The psychological benefit of Receiver under a certain belief 7 is given by his 


anticipated material payoff 


U(n) = [ u(a(n), 6) n(d8). 


However, holding belief 7 when the Bayesian belief generated by some signal is yu 
comes at a psychological cost C(n, 14) for Receiver. We assume that this cost is given 


by the Kullback-Leibler divergence between 7 and jz, formally defined by 
di di 
C'(n, pb -[ in (52)) wld), 
(non) = f H@) in (S20)) (ae) 


for any 7,4 € A(O), where dn/dy is the Radon-Nikodym derivative of 7 with 
respect to yu, defined whenever 7 is absolutely continuous with respect to uu. This 


assumption is made for tractability but does not qualitatively affect our main results.'° 


There might be more than one such selection if there exists some 7 € A(@) at which Sender is 
indifferent between some actions in A(7). In that case, we pick arbitrarily one of those. 

13We show that our results on Receiver’s equilibrium beliefs and behavior continue to hold 
when the psychological cost functions belongs to a more general class of statistical divergences in 
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Accordingly, we define Receiver’s psychological payoff as 


W(n, #) = U(n) - CMH) 


for any n, 4 € A(@), where p € R‘ parametrizes the extent of Receiver’s wishfulness. 
Receiver’s belief 7 must maximize his psychological payoff given any Bayesian belief 


pu. Therefore, it must belong to the optimal beliefs correspondence 


B(w) = arg max W(n, 11), 
ne A() 
for any pp € A(O), and Receiver’s psychological payoff when he holds a belief 7 € B(1) 
is 


U(u) = U 
(11) a (yp) 


for any Bayesian posterior 4p € A(Q).'4 We assume that when Receiver is psycholog- 
ically indifferent between several beliefs in B(j) he picks the one that maximizes 
Sender’s expected utility. Therefore, Receiver’s equilibrium belief is given by a 
selection 7(41) € B(j) which maximizes Sender’s expected payoff.'? This tie breaking 
rule ensures that the Receiver’s equilibrium belief is uniquely defined and simplifies 


the characterization of the optimal information policy. 


Persuasion problem. We can equivalently think of Sender committing ex-ante to 


a signal structure (0, S') or to an information policy T © T(t), where 


T (uo) = ‘7 € A(A(O)) : a (OQ) r(djz) = fo(@) for any Borel set © C | ; 


is the set of Bayes-plausible distributions over posterior beliefs given the prior pu. 


We assume Sender knows Receiver is a wishful thinker. Accordingly, she correctly 


appendix B.1. 

14As already noted by Bracha and Brown (2012) as well as Caplin and Leahy (2019), this 
optimization problem has a similar mathematical structure to the multiplier preferences developed 
in Hansen and Sargent (2008) and axiomatized in Strzalecki (2011). Precisely, the agent in Strzalecki 
(2011) solves 

: 1 
max min, f u(a,0) (a8) + 5C(n. 1). (2.1) 

for any given » € A(O). In that model, the parameter p measures the degree of confidence 
of the decision-maker in the belief 4 or, in other words, the importance he attaches to belief 
misspecification. Conclusions on the belief distortion in that setting are naturally reversed with 
respect to our model: a receiver forming beliefs according to equation (2.1) would form overcautious 
beliefs. Studying how a rational Sender would persuade a Receiver concerned by robustness seems 
an interesting path for future research. 

1 Again, if Sender is indifferent between some beliefs we pick arbitrarily one of those. 
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anticipates the belief Receiver holds in equilibrium. Since Receiver’s equilibrium belief 
characterizes how he would distort his belief away from any realized Bayesian posterior, 
Sender can choose the best information policy by backward induction, knowing: 
(i) which belief 7(~) Receiver holds in equilibrium after a posterior 4 € supp(T) 
is realized and (ii) which action a(n(j1)) Receiver chooses in equilibrium given the 


distorted belief 7(j:). Sender’s indirect payoff function is therefore given by 


v(u) = v (a(n(¥))) 


for any 4 € A(O) and, hence, Sender’s value from persuading a wishful Receiver 


under the prior [Wo is 


Vigo) = mae f(a) (du) (2.2) 


2.3. Receiver’s wishful beliefs and behavior 


In this section, we first extend Caplin and Leahy (2019) results by characterizing 
Receiver’s equilibrium beliefs and behavior without imposing any restrictions on the 
action or state space. 

To begin with, let Receiver’s anticipated material payoff under action a and belief 
7 be defined by 


nln) =f w(a,6) n(a). 
Moreover, let 


1 
Na(H) = arg max U,(n) — —C(n, 1), 
nEA(O) p 


be Receiver’s belief motivated by action a under posterior and 


1 
Wa = a fe ae ; ; 
(1) a (7) ra LL) 


be Receiver’s maximal psychological payoff motivated by action a under posterior ju. 
We identify Receiver’s equilibrium belief n(u) by: (i) finding the belief motivated 
by action a under 1, resulting in psychological payoff V,(), for any a and yp; (ii) 
finding which action it is optimal to motivate by maximizing V,(j) with respect to 


a. proposition 4 characterizes 7,(w) and V,() in closed-form. 


Proposition 4. Receiver’s maximal psychological payoff motivated by action a under 


59 


the Bayesian posterior {4 is given by 
1 
Va(u) = Pa exp (pu(a,@)) (dé) } , (2.3) 
) 
and is attained uniquely at the belief 


exp (pu(a,@)) u(dé) 


Na(u)(O) = (2.4) 
(2) 


exp (pu(a,@)) (dé) 


for any Borel set 8 C O. 


Proof. See appendix B.1. 


Remark now that if the action a uniquely maximizes Receiver’s psychological 
payoff under Bayesian posterior jz we have n(j) = na(). If, on the other hand, 
Wal) = Va(w) at uw for some a’ # a, meaning that Receiver is psychologically 
indifferent between two beliefs, then Sender breaks the tie. As a consequence, if 
pu € A(O) satisfies 

Valu) > Va(n), (2.5) 


for all a’ £ a, meaning that Receiver psychologically prefers action a to any other 


action a’, then Receiver’s equilibrium belief is given by 


n()(O) = na(H)(O), 


for any Borel set O C ©. If 4s € A(O) satisfies 


for some a’ # a, meaning that Receiver is psychologically indifferent between some 


actions a’ and a, then Sender picks her preferred belief given by 


n(H)(O) = nax(H)(), 


where a* € arg maxge{a,a’} U(@). 

First, we can see from equation (2.4) that Receiver only distorts beliefs that induce 
actions with state-dependant payoffs, i.e., Receiver’s beliefs are stakes-dependent. 
Formally, for any a € A, we have 7,(4) # w if, and only if, there exists 6 4 6’ such 
that u(a,@) 4 u(a, 6’). Second, Receiver forms beliefs that overweight the states 


associated with the highest payoff, giving rise to overoptimism. Formally, we always 
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have a(14)(Qa) > u(O.) for any a € A where O, = arg maxgeo u(a, 0). Moreover, 
Receiver’s belief about payoff maximizing states 7,(41)(Q,) grows monotonically and 
eventually converges to 1 as Receiver’s wishfulness p grows from 0 to +oo.'® 

As proposition 4 shows, wishful thinking leads Receiver to hold overoptimistic 
beliefs. The next result shows that wishful thinking distorts Receiver’s behavior 


accordingly. 


Corollary 4. Under his equilibrium belief, Receiver’s optimal action correspondence 
is gwen by 
A(n(u)) = argmax f exp (pu(a,6)) (a8), 
) 


acA 
for any » € A(O) so Receiver’s equilibrium action a(n(u)) corresponds to Sender’s 


preferred selection in A(n(1)). 


Remark that this result comes as a direct consequence of proposition 4 as, by 
definition, any action a is optimal under the belief motivated by action a. As already 
observed by Caplin and Leahy (2019), the previous result states, in essence, that a 


Receiver forming wishful beliefs behaves as a Bayesian agent whose preferences are 


distorted by the function z +> exp(pz) for any z € R. Importantly, from Sender’s 
point of view, a wishful Receiver’s behavior is indistinguishable from that of a 
Bayesian rational agent with payoff function exp(pu(a,@)). Accordingly, since the 
function z ++ exp(pz) is strictly convex as soon as p > 0, an agent forming wishful 
beliefs is less risk averse than his Bayesian self. 

Corollary 4 also shows that wishful thinking materializes in the form of “motivated 
errors” in the sense of Exley and Kessler (2019): by choosing psychologically desirable 
beliefs, Receiver commits systematic errors in his decision-making, i.e., acts as if he 


had cognitive limitations or behavioral biases relatively to a Bayesian decision-maker. 


2.4 Sender’s value from persuasion 


In this section, we assume that the action space of Receiver is binary, so A = {0, 1}, 
and that Sender wants to induce a = 1, so v(a) = a. We provide necessary and 
sufficient conditions on Receiver’s preferences under which he would take action 1 
under a greater set of beliefs than a Bayesian Receiver. This allows us to compare 
Sender’s value from persuading a wishful rather than a Bayesian Receiver as a 


function of the model’s primitives, that is: Receiver’s preferences and wishfulness. 


16This property comes from the fact that wishful beliefs take the form of a soft-max function. 
For the sake of completeness we provide a proof of this result in appendix B.2. 
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The restriction to a binary set of actions is with loss of generality but allows better 
tractability. 
We start by defining the two following sets of beliefs: 


AP = {ne A(Q):a€ AW}. 


and 
AW = {uw € A(O) : a € A(n(u))}, 


for any a € A. The set A? (resp. A) is the subset of posterior beliefs supporting 
an action a as optimal for a Bayesian (resp. wishful) Receiver. We say that an action 
is favored by a wishful receiver if that action is supported as optimal on a strictly 


larger set of posterior beliefs by a wishful Receiver compared to a Bayesian. 


Definition 4 (Favored action). An action a € A is favored by a wishful Receiver if 
AP GAY. 


Assume for now on that 6 = {6,0}. We first characterize when a wishful Receiver 
favors action a = 1 when the state space is binary and show afterwards that our 
results extend to any finite state space. Let us denote u(a,@) =u, and u(a,@) = Ta 
for any (a,@0) € A x O. Assume that Receiver wants to “match the state,” such that 
U1,U9 > Uo, u,. Define the payoff variability under action 0 by uo = Up — Uo, the 
payoff variability under action 0 by uy = U, — u, and the indicator of the highest 


achievable payoff by Umax = Ug — U1. With a small abuse of notation, denote 7 = 7(@) 


and p= (8). 

By corollary 4, comparing how a wishful Receiver behaves compared to a Bayesian 
one is equivalent to comparing the behavior of two Bayesian receivers with respective 
payoff functions exp(pu(a,@)) and u(a,@). Thus, denote py? (resp. p™(p)) the 
belief at which a Receiver with preferences u(a,0) (resp. exp(pu(a,@))) is indifferent 


between the two actions. Those beliefs are respectively equal to 


uP = Ug — Uy 
Ug — Uy + U1 — Uo 
~~ ees 
exp(puy) — exp(pu 
nu (p) “ ! 


~ exp(pu9) — exp(pu;) + exp(pt) — exp(pT%o) 

With only two states, a wishful Receiver favors action a = 1 if and only if p” < p®, 
since whenever that condition is satisfied a wishful Receiver takes action a = 1 under 
a larger set of beliefs than a Bayesian. Next proposition characterizes when this is 


the case. 
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Lemma 3. Action a = 1 is favored by a wishful Receiver if, and only if: 
(@) tae = 0 and tig < Ui, OF; 

(i) Une <0; ty > ty ond p> 9, oF 

(itt) Umax > 0, Up < Uy and p< Pp. 


where p is a strictly positive threshold such that 


Proof. See appendix B.3. 


Two key aspects of Receiver’s material payoff thus determine which action he 
favors: the highest achievable payoff as well as the payoff variability for both actions. 
It is easy to grasp the importance of the highest payoff. Since the wishful thinker 
always distorts his beliefs in the direction of the most favorable outcome, in the limit, 
when there is no cost of distorting the Bayesian belief, Receiver would fully delude 
himself and always play the action that potentially yields such a payoff. The payoff 
variability u,, on the other hand, is precisely Receiver’s marginal psychological benefit 
from distorting his belief under action a. Hence, the higher the payoff variability 
associated with action a, the more the uncertainty about @ is relevant when such 
action is played and the bigger the marginal gain in anticipatory payoff the wishful 
thinker would get from distorting beliefs. 

lemma 3 states that if an action a has both the highest payoff ug or UW, and 
the greatest payoff variability u, among all actions a € A, it is always favored. If 
an action has either the highest payoff or the greatest payoff variability, then the 
wishfulness parameter p defines whether or not it is favored: for high wishfulness 
the action with the highest payoff is favored, whereas for low wishfulness it is 
the action with the greatest payoff variability that is favored. The intuition is 
the following: for sufficiently high values of Receiver’s wishfulness, Receiver can 
afford stronger overoptimism about the most desired outcome, thus favoring the 
action that potentially yields this outcome despite such action not being associated 
with the highest marginal psychological benefit. In contrast, for sufficiently low 
values of p, Receiver cannot afford too much overoptimism about the most desired 
outcome. Hence, he prefers to distort beliefs at the margin that yields the highest 
marginal psychological benefit, such that the action associated with the highest 
payoff variability is favored. 


The next proposition extends lemma 3 to an arbitrary finite number of states. 
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Proposition 5. Assume O is a finite set with more than two elements. Receiver 
favors action a = 1 if, and only if, for any pair of states 0,0’ € O, Receiver’s material 
payoffs associated with those states and his wishfulness parameter p satisfy one of 


the conditions (i), (it) or (iti) in lemma 3. 


Proof. See appendix B.4. 


Proposition 5 can easily be visualized graphically in an example with three states. 
Assume 9 = {0,1,2} and denote Lb (resp. Liber) the belief making a Bayesian (resp. 
wishful) Receiver indifferent between actions a = 0 and a = 1 when (0), (0) > 0 
but (0) = 0 for any 6,6',0” € ©. In figure 2.1 we illustrate how AW compares to 


A? when Receiver’s payoff function is given by: 


wad) |@=0|\¢=]1)¢=2 
a=0 Z 3 —1 
a= 1 0 4 


=1 p=2 
re ie 


Figure 2.1: Comparison of supporting sets of beliefs. In blue, the set of Bayesian 
posteriors supporting action a = 1 for a Bayesian Receiver. In red, the set of Bayesian 
posteriors supporting action a = 1 for a wishful Receiver. 


Notice that for the two pairs of states (0,2) and (1,2), the associated payoffs satisfy 
property (i) in lemma 3. That is, action a = 1 is associated with the highest payoff 
u(1,2) = 4 as well as the highest payoff variability u(1,2) — u(0,2) = 5, under 
both pair of states. As a consequence, lemma 3 applies whenever focusing on those 
two pairs of states letting the other one being assigned probability zero. Then, we 


have jij’) > Me, and puj', > pf. Remark now, that AP = co({ugy, uP, d2}) and 
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AW = co({tb'o, H1'9, 62}), where dg denotes the Dirac distribution on state 0 € ©. 
Consequently, A? c AW so action a = 1 is favored by Receiver. If one of the 
conditions highlighted in lemma 3 were not satisfied for at least one of the pairs of 
states (0,2) or (1,2) then one of the thresholds ju, would be less or equal than j1¢’y, 
in which case AW would not be a superset of A? anymore. 

Let us now turn our attention to the following questions: when is Sender better-off 
facing a wishful Receiver compared to a Bayesian and how does the (Blackwell) 
informativeness of Sender’s optimal policy compare when persuading a wishful 


or a Bayesian Receiver? Remember that Sender chooses an information policy 


T € A(A(O)) maximizing 
[wren 
A(@) 


1 if we AW 
u(y) = 


where 


0 otherwise 


subject to the Bayes plausibility constraint 


| u7(du) = po: 
A(®) 


In the binary state case, it means that the threshold belief 1“ corresponds to the 
smallest Bayesian posterior Sender needs to induce to persuade a wishful Receiver to 
take actiona = 1. Therefore, lemma 3 and proposition 5 have immediate consequences 


for Sender. 


Corollary 5. Let 0 be an arbitrary finite space with at least two elements. Then, 
Sender always achieves a weakly higher payoff when interacting with a wishful Receiver 
compared to a Bayesian for any prior [to € ]0,1| if, and only if, for any pair of states 
0,0’ € ©, Receiver’s material payoffs associated with those states and his wishfulness 
parameter p satisfy one of the conditions (i), (ii) or (itt) in lemma 3. Moreover, 
when the state space is binary, Sender’s optimal information policy is always weakly 


less (Blackwell) informative than in the Bayesian case. 


To illustrate corollary 5 we represent in figure 2.2 the concavifications of Sender’s 
indirect utility when Receiver is wishful or Bayesian in two different cases. The case 
corresponding to lemma 3 is represented in figure 2.2a. Sender is always better-off 
persuading a wishful compared to a Bayesian receiver as V (19) > V*@(ju9) for any 
fio € ]0,1[. On the other hand, if Receiver’s preferences or wishfulness do not satisfy 
any of the properties in lemma, 3, then Sender is weakly worse-off under any prior. 


This case is represented on figure 2.2b. 
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(a) At least one property in lemma 3 is satis- (b) No property in lemma 3 is satisfied. 


fied. 


Figure 2.2: Expected payoffs under optimal information policies. Red curves: ex- 
pected payoffs under wishful thinking. Blue curves: expected payoffs when Receiver 
is Bayesian. Dashed-dotted green lines: expected payoffs under a fully revealing 
experiment. 


When Sender wants to induce an action that is (resp. is not) favored by a 
wishful Receiver, persuasion is always “easier” (resp. “harder”) for Sender in the 
following sense: Sender needs a strictly less (resp. strictly more) Blackwell informative 
policy than KG to persuade Receiver to take his preferred action. Equivalently, if 
experiments were costly to produce, as in Gentzkow and Kamenica (2014), then 
Sender would always need to consume less (resp. more) resources to persuade a 
wishful Receiver to take his preferred action than a Bayesian. The hypothesis of 
a binary state space facilitates the comparisons between the Bayesian-optimal and 
the wishful-optimal information policies as it ensures that the Bayesian-optimal 
and the wishful-optimal information policies are Blackwell comparable. Although 
the informativeness comparisons in corollary 5 do not necessarily extend when the 
state space contains more than two elements, Sender’s welfare comparisons, in 
contrast, still hold under any arbitrary finite state space. We compare in figure 2.3 
Sender’s optimal information policies when Receiver is Bayesian and wishful, with 
the same payoff function as in figure 2.1. When the state space is finite, a policy 
T € T(pl9) such that all elements in supp(7) are affinely independent is (weakly) 
more Blackwell-informative than a policy 7’ € T (uo) if, and only if, and supp(r’) C 
co(supp(T)) (see Lipnowski et al., 2020, Lemma 2). The support of the Bayesian- 
optimal policy 7? (resp. wishful-optimal policy T”) is {y2, Lipo} (resp. {u™, Pook 
Hence, co(supp(7”)) = {uw € A(O) : Ft € [0,1],u = tu! + (1 — tugy}. It is 
visible on figure 2.3 that {w?, uf} ¢ co(supp(r”)). Hence, 7? and r™ are not 
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Figure 2.3: The Bayesian-optimal policy 7? (in blue) vs. the wishful-optimal policy 


7 (in red) with respective supports {w2, uf} and {u, pf'y}. 


Blackwell comparable. However, since Sender is interested in inducing action a = 1 
and Receiver’s favors that action, Sender’s expected payoff is higher for any prior 


when Receiver is wishful. 


2.5 Applications 


In this section, we expose in three applications that corollary 5 might have important 


economic CONSe€quUeNCes. 


2.5.1 Information provision and preventive health care 


A public health agency (Sender) informs an individual (Receiver) about the prevalence 
of a certain disease. Receiver forms beliefs about the infection risk, which can be 
either high or low: 0 < @ <6 <1. The probability of contracting that illness also 
depends on whether the individual adopts a preventive treatment or not, where a = 1 
designates adoption. Investment in the treatment entails a cost c > 0 to Receiver.'” 
Moreover, let us assume that the effectiveness of the treatment, i.e., the probability 
that the treatment works, is a € [0, 1] so that the probability of falling ill, conditional 
on adoption, is (1— a). The payoff from staying healthy is normalized to 0 whereas 


the payoff from being infected equals —> < 0 where ¢ is the severity of the disease. 


'7One might interpret that cost to be the price of the treatment or the either material or 
psychological cost from undertaking medical procedures. 
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Receiver’s payoff function is 
u(a, 0) = (1 — a)(—<8@) + a(—(1 — a)@¢ — c) 


for any (a,@) € A x ©. We assume that ca@ < c < sa so Receiver faces a trade-off: 
he would prefer not to invest if he was sure the probability of infection was low and, 
conversely, would prefer to invest in the treatment if he was sure the risk of infection 
is high. Also remark that Receiver always expects to experience a negative payoff, 
as u(a,@) < 0 for any (a,@) € Ax O. 

The public health agency wants to maximize the probability of individuals adopt- 
ing the preventive treatment.'® The agency informs individuals about the prevalence 
of the disease by designing and committing to a Bayes-plausible information policy 
T. A Bayesian Receiver would be indifferent between adopting or not the treatment 


at belief 
Bp ¢- ads 


a a(O — A)s 
In contrast, by proposition 4 and corollary 4, the equilibrium beliefs and behavior of 


a wishful Receiver are given by 


Ll ~ 
pee, 
aa (—p(1 — a)<( — 8) | 
pexp(—p(1 —a f aif 
pemp(—p(l—a)s@—-8)+0—-p =" 
and 
a(n(u)) = 1 {> pi} 
for any posterior belief yw € [0,1], where 
w exp(—pGs) — exp(o(—(1 — a@)@s — c)) 


exp(—ps@) — exp(o(—(1 — a) 85 — c)) + exp(p(—(1 — a)8¢ — c)) — exp(—p8s) 


We illustrate the belief distortion of Receiver in figure 2.4a. Receiver is always 
overoptimistic about his probability of staying healthy, as 7(w) < yu for any ps € [0, 1]. 
Remark that non-adoption is associated with the highest possible payoff —c¢@ as well 
as the highest payoff variability <(@ — 9). Accordingly, by lemma 3, Receiver always 


18Maximizing the probability of adoption is a sensible objective since most infections cause 
negative externalities due to their transmission through social interactions. Therefore, a benevolent 
planner who wants to reduce the likelihood of transmission of an infection would do well to maximize 
the rate of adoption of the preventive treatment (for example, maximize condom distribution to 
control AIDS transmission, maximize injection of vaccines to control viral infections, or maximize 
mask use to control the spread of airborne diseases). 
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(a) equilibrium belief (2) as a function of jz. (b) Behavioral threshold 4” as a function of 
p. 


Figure 2.4: The belief correspondence for ¢ = 2, c = 0.5, a = 0.8, 9= 0.1, = 0.9 
and p = 2. Receiver is always overoptimistic concerning his health risk for any 
induced posterior, except at js = 0 or 1 = 1. Moreover, the belief threshold py. as a 
function of p is strictly increasing and admits js? as a lower bound. 


favors non adoption as illustrates figure 2.4b. As a result of corollary 5, Sender 
always needs to induce higher beliefs for Receiver to adopt the treatment than she 
would need if she faced a Bayesian agent, all the more so when Receiver’s wishfulness 
p becomes larger. Therefore in this example, overoptimism of Receiver always goes 
against Sender’s interest. 

It is interesting to see how Sender’s probability of inducing the adoption of 
the treatment evolves with respect to the severity of the disease ¢, as well as the 
effectiveness of the treatment a.'° We represent on figure 2.5b the probability that 
Sender induces adoption of the treatment under the optimal information policy as a 
function of ¢. Notice that the probability of inducing adoption is less sensitive to 


’ when facing a wishful Receiver 


the severity of the disease, i.e., becomes “flatter,’ 
compared to the Bayesian when the treatment becomes less effective. The intuition 
is the following: when the treatment is fully effective, i.e., a = 1, Receiver’s payoff 
in case he invests in the treatment becomes state independent. Therefore, he does 
not have any incentive to distort beliefs when taking action a = 1. As a result, 1 
decreases and Receiver holds perfectly Bayesian beliefs when yp > pw“. However, 
whenever there is uncertainty about the treatment efficacy, i.e., a < 1, uncertainty 


about infection risk matters and gives room to belief distortion even when taking 


!9This probability is pinned down by the Bayes-plausibility constraint and equal to T*% = puo/p? 
in the Bayesian case and tT = jio/y in the wishful case. 
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(a) Behavioral thresholds y? (in blue) and (b) Probability 7 of inducing treatment adop- 
u (in red) as functions of severity ¢. tion as a function of severity ¢. 


Figure 2.5: Red (resp. blue) curves correspond to wishful (resp. Bayesian) Receiver. 


We set parameters to c = 0.5, a = 0.8, 8 = 0.1, 0 = 0.9 and p = 2. Full lines 
correspond to the case where a = 1 whereas dashed curves correspond to a = 0.8. 


the treatment. Decreasing a@ increases the anticipated anxiety of Receiver leading to 
more optimistically biased beliefs, a higher 4” and, in turn, complicates persuasion 
for Sender for any severity s. Remark on figure 2.5b that 7 decreases sharply with 
a for a fixed s. In fact, one could show that as a decreases, T becomes closer and 
closer to fio for any ¢, meaning that the agency cannot achieve a substantially higher 
payoff than under full disclosure.?° 

In the next subsection we extend out framework to the case of a continuous state 
space and linear preferences. We show that results in the finite state space case 
extend to this setting. We also highlight why we might expect persuasion to be more 


effective in the context of risky investment decisions. 


2.5.2 Persuading a wishful investor 


A financial broker (Sender) designs reports about the return of some risky financial 
product to inform a potential client (Receiver). The return of the product is 


06€O= [6,6], where @ < 0 < @. Returns are distributed according to the prior 


20One additional implication of this result is the following. Assume the true treatment efficacy is 
a but Receiver perceives the efficacy to be @ < a (e.g. because Receiver adheres to anti-vaccines 
movements or generally mistrusts the pharmaceutical industry). In that case, the doubts expressed 
by Receiver about the treatment efficacy makes him even more anxious which, in turn, makes 
belief distortion stronger and, thus, downplays the effectiveness of the agency’s information policy 
whatever is the severity of the disease. 
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distribution fio. Let F’ be the cumulative distribution function associated with juo 
and let us assume that jug admits a continuous and strictly positive density function 
f over [6,0]. Receiver has some saved up money he is willing to invest and chooses 
action a € A = {0,1}, where a = 0 represents the choice of non-investing in which 
case Receiver’s payoff is 0 and a = 1 represents investing, in which case Receiver’s 
payoff is the realized return 0. The broker is remunerated on the basis of a flat fee 
v > 0 that is independent of the true product’s profitability. Hence, Receiver’s payoff 
is u(a, @) = a0 while Sender’s payoff is v(a,@) = va for any (a,0) € A x O. 
Receiver forms motivated beliefs about the return of the financial product. By 


proposition 4 his equilibrium beliefs are given by 


(0) if [ exp(p8) (49) <1 
n(u)(8) = [ exp(o6) (dd) 


| exp(p#) (d@) 


? 


if : exp(8) u(d8) > 1 


for any  € A(O) and any Borel set 6 C ©, and, by corollary 4, his equilibrium 


behavior is given by 


a(n(u)) = 1 { [ex (o#) w(ao) > i} | 


Therefore, Sender’s indirect utility is equal to 


u(u) =e { f exp(on) wae) > 1}. 


for any 4. € A(O). To make the problem interesting, we assume that neither a 
Bayesian nor a wishful Receiver would take action a = 0 under the prior. That is, 
i= J? Ou0(ad) <0 and ¢= J? exp(08)110(d0) <1 

Under these assumptions, remark that a signal structure o that induces a distri- 
bution 7 over posterior beliefs 4 matters for Receiver and Sender only through the 
distribution of exponential moments x = f, exp(p0) (dé) it induces. Let X be the 
space of such moments, that is, X = co(exp(pQ)), where exp(Q) is the graph of the 
function 6 4 exp(p@) for all 6 € [6,6]. That is, X = [x,z] where x = exp(p@) and 


= exp(p0). Let G be the prior cumulative distribution function over the random 


?1Tt is in fact always true that ™m <0 when # < 1. Hence, assuming mm < 0 additionally to # < 1 
is without loss. 
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variable exp(p@) induced by F’, that is 


for any x € [z,Z]. By standard arguments (Gentzkow and Kamenica, 2016), the 
problem of finding an optimal signal structure o reduces to finding a cumulative 


distribution function H that maximizes 


/ Ree rrice 


[ He@acs f ee)ax 


subject to 


for every z € [x,Z]. The solution to such a problem is well-known and can be found 
either using techniques from optimization under stochastic dominance constraints 
(Gentzkow and Kamenica, 2016; Ivanov, 2020; Kleiner et al., 2021) or linear pro- 
gramming (Kolotilin, 2018; Dworczak and Martini, 2019; Dizdar and Kovaé, 2020). 
In our context, the optimal signal is a binary partition of the state space. That is, 


the broker reveals whether the return is above or below some threshold state. 


Proposition 6. There exists a unique 0 € [6,6] verifying 


Fe a exp(p9) [(0) dé = 1 


and such that Sender pools all states 6 € [0 ,0| under the same signal s = 1, i.e., 
o(1|0) =1 for all 0 € [0,6], and similarly pools all states 6 € [0,0] under the 
same signal s = 0. Hence, the probability of inducing action a = 1 for Sender is 
equal to 


[ a(1|0)f(0) dd =1— Fe"). 


Ww 


Proof. See Ivanov (2020), Section 3. 


It is optimal for Sender to partition the state space at the threshold state making 
Receiver indifferent between investing or not at the prior. Such an information policy 
can intuitively be seen as the investment recommendation rule which maximizes the 
probability that Receiver invests given the prior distribution of returns F’. 

Using the exact same arguments as above, one can deduce that the probability of 


inducing action a = 1 when Receiver is Bayesian is given by 1 — F(0”) where 0? is 


2 


the unique threshold verifying the equation 


1 6 


TF |, £0) 00 =0. 


Therefore, Sender is more effective at persuading a wishful Receiver if and only if 
OV =< 9. 


Proposition 7. It is always true that OY < 0°. Hence, Sender is always more 


effective at persuading a wishful rather than a Bayesian investor. 


Proof. See appendix B.6. 


The above result relates to proposition 5: buying the risky product is favored by 
the wishful investor since it is the action that yields both the highest possible payoff 
and the highest payoff variability. This example thus illustrates how the results in 
the finite state space case naturally extend to an infinite state space setting with 
linear preferences. It further helps explaining the pervasiveness of persuasion efforts 
in financial and betting markets, illustrating why some financial consulting firms 


seem to specialize in advice misconduct and cater to biased consumers. 


2.5.3. Public persuasion and political polarization 


A Sender (e.g., a politician, a lobbyist) persuades an odd-numbered finite group 
of voters N = {1,...,n} (e.g., a committee or parliamentary members) to adopt 
a proposal x € X = {0,1}, where x = 0 corresponds to the status-quo. The state 
space is binary, 0 = {0,1}, and the audience uses only the information disclosed by 
Sender to vote on the proposal. Let a’ € A = {0,1} be the ballot cast by voter i, 
where a’ = 0 designates voting for the status-quo. The proposal is accepted if it is 
supported by a simple majority of voters. We assume Sender is only interested in 
the proposal being accepted, so her utility is v(a) = x. In contrast, any voter i € N 


has payoff function 
ui(x, 8) = 266! + (1—2)(1—0)(1 — i) 


for any (z,0) € X x © where 3° € [0,1] parametrizes the partisan preference of 
voter 7. That is, all voters agree that the proposal should be implemented only when 
§ = 1, but they vary in how much they value the implementation of the proposal. 
We assume 3 is symmetrically distributed around 1/2 in the population. Denote 


B™ = 1/2 the median voter’s preference. 
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All voters form wishful beliefs and p is assumed homogeneous among the electorate. 
As a result, the direction as well as the magnitude of voters’ belief distortion depends 
only on their partisan preferences 3.22. By proposition 4, voter i’s belief under 


posterior jz € [0,1] is given by 


| eG wenea—ay 7 Se 
n(u, 8") = (0B) 
[exp(pp' : i 
pexp(ph") + (1 — 4) eee 
where 
A (B%) = exp(p(1 — 8")) -1 


~ exp(p(1 — 6%) + exp(p 8?) — 2’ 
Remark that, similarly as in Alonso and Camara (2016), since the policy space is 
binary and voters do not hold private information there is no room for strategic 
voting in our model. Hence, citizen i’s voting strategy under belief 7(j1, 3") is given 
by 

a(n(u, B)) =1{u> pw" (6)}. 
Due to the heterogeneity in 6, there is always some level of belief polarization among 


wishful voters for any jz € |0,1{. Let us measure such polarization by the sum of the 


absolute difference between each pair of beliefs in the audience 
n-l on 
mu) = S> S> In(, 8") — nu, B”)| (2.6) 
i=1 j=itl 
for any p € [0, 1]. 


Proposition 8. Under Sender’s optimal information policy, the signal that leads to 
the wmplementation of the proposal also generates the maximum polarization among 


voters. 


Proof. See appendix B.5. 


To build an intuition of why this is the case, let’s first note that, in our model, 
belief polarization and action polarization are closely related. Agents voting for the 
implementation of the proposal distort their beliefs upwards, whereas agents voting 
for the status quo distort their beliefs downwards. We can thus see that maximum 


belief polarization should be attained for some belief for which action polarization 


22Tt has been shown in psychology (Babad et al., 1992; Babad, 1995, 1997) as well as in 
behavioral economics (Thaler, 2020) that voters political beliefs are often motivated by their 
partisan orientation. 
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is maximized, that is, for some belief at which (n + 1)/2 agents are voting one 
way and the remaining (n — 1)/2 are voting another way. This is the case for any 
we [bY (87), wp (per). 

Due to sincere voting, the result of the election always coincides with the vote of 


the median voter under posterior belief jz. Accordingly, Sender’s indirect utility is 


o(p) =1{p > p(B}, 


for any pw € [0,1]. The optimal information policy for Sender is thus supported 
on {0,u”(6™)} whenever jig € J0,1/2[, and on {fio} whenever fig € Ju” (B™), 1[. 
The posterior pu (8), which leads to the implementation of the proposal, belongs 
to the interval [uw (6"™~'), u™(8™*)[ and, as such, is in the neighbourhood of the 
belief that maximizes polarization for any distribution of preferences. When such 
distribution is symmetric around the median voter, polarization is maximized exactly 
at the middle point in that interval, which is u“(6™). 

We illustrate proposition 8 below in section 2.5.3 in a setup with 3 voters. 


Following corollary 4, wishful thinking induces voters to switch from disapproval to 


Figure 2.6: Beliefs distortions in the electorate for p = 2, 3; = 1/4, 82 = 1/2 and 
63 = 3/4. Polarization equals m(12) = 2(n(u, 6!) — n(u, 8°)) which is maximized at 


uN (8?) = 1/2. 


approval at different Bayesian posteriors (6*). The optimal information policy 
T for Sender is the one that maximizes the probability of the median voter voting 
for the approval. That is, supp(r) = {0,u”(6™)} and up (6™) = 1/2 is induced 
with probability 7 = uw (8™)/po whenever po € J0, wu (67)[ and supp(r) = {po} 
whenever jug € | (67), 1[. 
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Let us now turn to polarization. First, it is quite easy to see in section 2.5.3 that 


(1) = 2 (n(u, B*) — n(n, B°)) 


for any yu € [0,1], as the distances to the median belief add up to 7(~, 81) — n(, 8°). 
Thus, it suffices to check where n(, 8!) — n(u, 8°) is maximized. Quite naturally, 
polarization is maximized when the posterior belief induced by Sender is in between 
yu (83) and w“(6'). In particular, it is exactly maximized at the posterior belief 
pu (8?) = 1/2 which is exactly the posterior belief Sender induces to obtain the 
approval of the proposal under her optimal policy. 

proposition 8 establishes that the intuition developed in this example is generally 
valid when the partisan preferences of voters are symmetrically distributed around the 
median. In other words, attempts by a rational sender to maximize the probability 
of approval induces, as an externality, maximal belief polarization among wishful 
voters. This result differs from the literature studying the possible heterogeneity of 
beliefs due to deliberate attempts at persuasion which tends to focus on polarization 
arising from differential access to information.?? Our model gives an alternative 
mechanism to the rise of polarization, based on motivated beliefs: a sender can induce 
polarization involuntarily when her message is subject to motivated interpretations, 
and such polarization might be especially large whenever sender’s strategy involves 


targeting an agent with a median preference. 


2.6 Conclusion 


In this paper we study optimal persuasion in the presence of a wishful Receiver. By 
modeling wishful thinking as a process that optimally trades-off gains in anticipatory 
utility with the cost of distorting beliefs, we characterize the correspondence between 
wishful and Bayesian beliefs, highlighting the particularities that such belief formation 
process entails. 

In particular, we show that wishful thinking impacts behavior, causing some 
actions to be favored in the sense that they are taken at a greater set of beliefs. This 
has important implications for the strategic design of information, as it adds some 
nuance on the way preferences and information determine behavior. Concretely, we 
show that, in the presence of wishful thinking, persuasion is more effective when it 
is aimed at inducing actions that are risky but can potentially yield a very large 


payoff and less effective when it is aimed at inducing more cautious actions. We 


?3See Arieli and Babichenko (2019) for general considerations on the private persuasion of multiple 
receivers and see Chan et al. (2019) for an application to voting. 
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use this model to illustrate why information disclosure seems less effective than 
expected at inducing preventive health behavior and more effective than expected 
at inducing dubious financial investments. Wishful thinking opens a channel for 
preferences to interfere in belief formation, raising the question of what kind of belief 
polarization could we observe in a population in which agents have access to the 
same information but vary in their preferences. We show in an application that an 
information designer interested in the approval of a proposal would, by optimally 
targeting the median voter in her choice of signal structure, induce, as an externality, 
maximum polarization among the electorate whenever the proposal is approved. 
Some studies already investigate the effects of wishful thinking on the outcomes 
of strategic interactions (see, Yildiz, 2007; Banerjee et al., 2020; Heller and Winter, 
2020). Further investigation on ways in which individual preferences might impact 
information processing and how these may impact social phenomena such as belief 
polarization in non-strategic and strategic settings seem to be promising paths for 


future research. 
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Chapter 3 


Text and Subtext 


Abstract 


We study a persuasion problem in which a sender faces an audience 
that is heterogeneous both in their preferences and in the extent to 
which they understand messages. The sender is able to exploit such 
heterogeneity to convey some information privately to some receivers — 
the subtext —, but is constrained by the publicly understood aspects of 
its own communication strategy — the text —. We characterize the set of 
joint distributions of posteriors that the sender can feasibly induce and 
show that the sender’s value from the problem can be retrieved through 


a recursive concavification procedure. 


JEL classification codes: D82, D83, D90. 


Keywords: Information design; persuasion; language; bounded rationality. 


°We thank Victor Augias, Jeanne Hagenbach and Eduardo Perez for the helpful discussions, as 
well as seminar audiences in Sciences Po. 
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3.1 Introduction 


Metaphysics should be written with accurate definitions and demonstrations, but 
nothing should be demonstrated in it that conflicts too much with received opinions. 
For thus this metaphysics will be able to be received. If it is once approved, 
then afterwards, if any examine it more profoundly, they will draw the necessary 


consequences themselves. 


—Gottfried Wilhelm Leibniz! 


In many instances of economic and political life, communication with a plurality of 
receivers is neither purely public nor purely private: some aspects of the information 
transmitted might be commonly understood — what we refer to as the text —, whereas 
finer aspects — the subtext — might only be observed by a subset of the audience. 
These settings allow for a mixed mode of communication, one that is more permissive 
than public communication, as it allows for some information to be transmitted 
privately through the subtext, but more restrictive than private communication, 
since one cannot target privately the receivers who only have access to the text. 

Think for instance of a hierarchical organization, where messages sent to lower 
ranks of the organization might also be observed by the upper ranks, whereas 
messages sent to upper ranks are not observed by those in lower echelons. In this 
case communication with the members of the organization exhibits the feature of 
varying degrees of refinement along the organization’s ranks: whereas members of the 
lowest echelon only have access to information that is public within the organization 
— the text —, members of higher ranks have varying degrees of additional information 
~— the subtext — 

Another example of such mixed mode of communication is what is termed in 
politics as dog-whistling: the usage of coded language designed to signal something to 
some groups (those who recognise the term) without antagonizing others (those who 
don’t). An example, taken from Albertson (2015), illustrates how such communication 
strategy might be used: In his 2003 State of the Union Address, George W. Bush 
declared that “there is power, wonder-working power, in the goodness and idealism 
and faith of the American people”. While most of the listeners would not infer any 
particular meaning from such phrase, evangelical listeners could recognize the term 
“wonder-working power" from a popular hymn, and thus perceive in this term a 


signal for them. While appealing explicitly to evangelicals could alienate part of the 


‘Leibniz continues: “In this metaphysics, it will be useful for there to be added here and there 
the authoritative utterances of great men, who have reasoned in a similar way; especially when 
these utterances contain something that seems to have some possible relevance to the illustration of 
a view”. 
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audience, doing so in a coded manner enabled Bush to convey some information 
privately to some members of the audience. 

The aim of this paper is to study communication in multi-receiver settings where 
the audience — either due to the organizational structure within which they are 
embedded or due to heterogeneity in receiver’s ability to decode messages — exhibits 
varying degrees of refinement with respect to the information they might have access 
to. We study a model in which a sender designs an information structure to persuade 
an audience to act in a certain way. Members of the audience vary in their preferences, 
but also in how finely they are able to extract information from the realized message. 
As in Blume and Board (2013), differences in refinement are modeled as differences 
in receivers ability to distinguish between different messages. 

Such heterogeneity in refinement gives the sender leeway to convey some informa- 
tion privately through the subtext. What the sender can convey through the subtext, 
however, is constrained by what is conveyed through the text. In section 3.3.1 
we characterize the joint distributions of posterior beliefs that can be induced by 
the sender: these are any joint distribution of posteriors such that i) the expected 
posterior of the coarsest receiver is equal to the prior and ii) conditional on the 
realization of a given posterior (call it 4) for some receiver, the expected posterior 
of any more refined receiver is equal to 4. We then show that the maximum payoff 
that the sender can achieve in the persuasion problem can be retrieved by a process 


of “recursive concavification”, which is formally defined in section 3.3.2. 


3.1.1 Related Literature 


This paper relates to the literature exploring the role of limitations to communication 
in information transmission, and in particular on information design (see Bergemann 
and Morris (2019)). 

Our setting is close to the one studied in Kamenica and Gentzkow (2011), 
who characterize optimal experiments under public communication. We expand 
their characterization of feasible distributions of posteriors and their concavification 
method to settings where communication is neither purely public nor purely private. 
Aybas and Turkel (2022) study a persuasion problem where the number of available 
messages is smaller than the number of states of the world or actions of the receiver, 
such that communication is inherently coarse. As in the present paper, they show 
that the value of persuasion is given by a modified concave envelope of the sender’s 
indirect utility. This limitation in the set of available messages is also present in 
Le Treust and Tomala (2019). 

Another related problem can be found in Bloedel and Segal (2018), who study 
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a persuasion problem with a rationally inattentive receiver. Like in our paper, 
coarseness in the receiver’s understanding is central to their analysis, but their focus 
is on endogenizing such coarseness through attention whereas our focus is on the 
role of heterogeneity in such coarseness across different receivers. Other limitations 
on receiver’s interpretation have also been explored by the literature (Eliaz et al., 
2021; Levy et al., 2022; Schwartzstein and Sunderam, 2021). 

The idea of limited language has also been used to study communication in 
several different contexts. Blume and Board (2013) study the role of limitations to 
language in the context of coordination games. Like the present paper, they model 
language competence as partitions of the set of available messages, although for most 
of their results they focus on a class of partitions that is more restricted than the 
one considered in the present paper. Hagenbach and Koessler (2020) study limited 
language competence on the part of the sender in cheap talk games. 

Our paper also has a close dialogue with the literature on information design in 
networks (Galperti and Perego, 2020; Egorov and Sonin, 2020; Corrao, 2021; Liporace, 
2021), in which receivers are embedded in a network describing how information 
“leaks” between one receiver and another. This raises questions regarding optimal 
seeding, privacy, and so on. The model present in this paper can be seen as one that 
analyses a particular network structure, in which information flows in a particular 


direction, resulting in receivers that can be ordered in terms of their information. 


3.2. Model 


3.2.1 Setup 


A sender designs an information structure to persuade the members of an audience 
to act in a certain way. All relevant uncertainties are summarized by the state of the 
world w belonging to a finite set Q and all players have a common prior [lg € A(Q) 
with full support. 

The audience is composed of n receivers 7 € {1,--- ,n}, with different receivers 


potentially differing in their preferences and their partial understanding of the 


messages. Receiver 7 has preferences u; : A; x (2 > R, where A; is the set of actions 


from which the receiver chooses. 


Sender’s preferences are given by v : A — R, where A = A, x --- x An. We 
assume that vu is additively separable in receiver’s actions such that we can write 
v(a1,°++ ,@n) = >>, u;(a;). Sender can design an information structure 0 : Q > 


A(M) to inform the audience, where M is a fixed set of messages. 
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3.2.2 Partial Understandings 


Each receiver is endowed with an understanding, which defines whether such receiver 
is able to differentiate between any two messages m,m’ € M. Formally, a receiver i’s 
understanding is a partition P; = {pj,--- , pr}, where P; is a collection of nonempty 
disjoint subsets of M that completely cover M. 

Receiver 7 is unable to distinguish between messages that belong to the same 
partition element p;, and as such must update his beliefs in the same way fol- 
lowing the realization of any such messages. Denote p;(m) the element of P; that 


includes a message m. Following the realization of m, receiver 7 forms posterior belief: 


icleatia = a(pi(m)|w)mo(w) do m'ep:(my 7 (7 |W) Ho (w) 
Li(w|pi(m)) = a(pilm))——— mepstmy FUT) 


As such, an audience’s understanding is characterized by a collection of partitions 
{Pi sien, n} as well as a collection of preferences {u;}ieq,.. yn}. In order to delimit 


the types of understandings we consider, we introduce two definitions: 


Definition 5 (Partition refinement). A partition P' is a refinement of partition P 


if every element of P’ is a subset of some element of P. 


Definition 6 (Refinement order). A collection of partitions {P,}ieq1,.. ny 18 said to 


allow for a refinement order if P; is a refinement of P; whenever i < j. 


In this paper we consider collections of partitions satisfying a refinement order, 
such that we can label the different members of the audience according to how finely 
they are able to understand the informational content of messages. Whenever a 
message m € M is realized, p;(m) can be seen as the tezt - the aspect of the message 
that is commonly understood by all members of the audience -, whereas p;(m) for 


i > 2 represent the different depths of subtext present in the message. 


3.2.3. Two Interpretations of the Model 


There are two ways one might interpret the model. One is the literal interpretation 
that each message realization m € M is public, but agents vary in how finely 
they might understand its informational content. This interpretation relates to the 
notion of language competence developed in Blume and Board (2013). Under this 
interpretation, one could think of the set MW as a set of sentences in English, for 
instance. A receiver who does not speak English at all won’t be able to differentiate 


between any of the sentences and thus won’t extract any information from a given 
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Figure 3.1: Three partitions satisfying a refinement order: In black P,, in red P, and 
in blue P3. 


message, whereas a receiver with some knowledge of such language will be able to 
distinguish between more sentences and thus be able to capture finer meaning from 
it. Such differences are present even among native speakers: whereas some people 
might not distinguish between two words with the same denotative meaning (text), 
others might be aware of differences in their connotative meaning (subtext), and 
thus be responsive to the usage of one word rather than the other. 

Importantly, such heterogeneity in understanding is often reflective of hetero- 
geneity in group identity: people’s ability at identifying something as particularly 
meaningful depends on their past experiences, education or interests, all of which 
tend to be correlated with their preferences. 

A second interpretation, closer to the idea of information systems present in 
Galperti and Perego (2020), is that the sender is constrained in its ability to target 
different groups. Imagine for instance that the audience is composed of two receivers, 
where the sender is contrained at only communicating publicly with receiver 1 whereas 
receiver 2 can be targeted privately. This setting could be represented by a message 
space M containing different messages (m1,™m2), where m, is the realization of the 
public message and mz the realization of the private message and where P, is a fully 
refined partition of M whereas P, is composed of several partition elements pooling 


together, for a given realisation of mj, all the different possible realizations of mg. 


3.3. Results 


3.3.1 Feasibility 


In this section we characterize the feasible distributions of posteriors that the sender 
can induce through its strategy o, as well as the value that she can achieve through 


persuasion. 


90 


Lemma 4. Consider two receivers i and j such that P; is a refinement of P;. Then, 


wi(wlp:) = > o(p;lpi)j(wlp,) 
Pj °Pj CPi 

Lemma 4 ties together the posterior beliefs of two agents j and 2: it establishes 
that, following the realization of some message belonging to a partition element p; 
of the least refined receiver, the expected posterior belief of receiver 7 must be 2’s 
realized posterior /1;(w|pj). 

Sender’s communication strategy o induces a joint distribution of beliefs in the 
audience, which we denote by 7. The following proposition characterizes the joint 
distributions of posterior beliefs that the sender can feasibly induce given some 


strategy o. 


Proposition 9 (Feasible distributions of posteriors). Let the audience’s understand- 
ings {Pi hiets,...n} Satisfy a refinement order. Sender can induce any joint distribution 


of posteriors T(fl1,°-** fn) such that: 


Dy fT (t1) = Ho 


Supp(t1) 


and 
2 Mj Tjli(My|Mi) = ba, Vd < J. 
Supp(r; i) 

Proposition 9 establishes that if the audience’s understandings satisfy a refinement 
order, Sender’s strategy a can induce any joint distribution of posteriors such that 
i) the beliefs of the coarsest agent satisfy the standard Bayes Plausibility condition 
(Kamenica and Gentzkow, 2011) and ii) conditional on the realization of some 
posterior jz of some agent, the expected posterior of any more refined agent must be 
pt. Note that these conditions imply that the beliefs of any agent 7 € {1,--- ,n} also 
satisfy the Bayes Pausibility condition. 


3.3.2 Optimality 


Proposition 9 establishes that the problem of the Sender can be viewed as a sequential 
information design problem: one of designing the distribution of beliefs of the coarsest 
agent and then, conditional on that, designing the distribution of beliefs of the second 
coarsest agent, and so forth. 

Since v is additively separable on the actions taken by the audience, we can denote 


Sender’s indirect utility as O(144,-++ fn) = doy, bi(Mi), where 6;(4;) = v;(Gi(";)). 
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Figure 3.2: Sender’s indirect utility 


Define: 


sup{z|(Mi-1, 2) € co(d; + Vi4i)} ford € {1,---,n—1} 
sup{z|(fi-1, 2) € co(t,)} or. =H 


Proposition 10. (Recursive Concavification) The value of an optimal signal for the 


sender is given by Vi(U0). 


The standard approach for finding the sender-optimal information structure in 
persuasion settings involves computing the concave envelope of sender’s indirect 
utility function and evaluating it at the prior belief. This approach is suitable in the 
multi-receiver case when communication is entirely public (i.e. when there’s only a 
text), but doesn’t apply directly in our setting since here receivers of different groups 
will form a different posterior after observing the same message. Instead, in our case 
the sender-optimal information structure can be found recursively, by identifying 
what would be the optimal distribution of jz,, conditional on some realization of 1,_1 
and then moving backwards and incorporating the value of such optimal distribution 


into the identification of the optimal distribution of jz,_1, and so on. 


3.4 Example 


Consider a simple setting where the audience is composed of two agents 7 € {1, 2}, 
each of whom chooses an action a; € {l,r}. Receiver’s payoffs depend on their 
actions and on a binary state of the world w € {L, R}, distributed according to a 
common prior Uo = Pr(w = R). 

Sender wants to induce receivers to take a = r and has utility v(a1, a2) = vi l{a, = 
r} + vgl{a2 =r}. For that purpose she chooses among a family of distributions 


{a(-|w) }wern,r} Over a message space M = {m 1, m2, m3, ma}. 
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Figure 3.3: Recursive concavification 


Each receiver has a distinct preference. Define Au? = u,(r, R) — u;(l,R) and 
Aut = u,(r, L) — u,(l, L), such that a each of the receivers has a distinct threshold 
belief 4; = ear under which they are indifferent between each of the actions. 
Consider Au??, Aud > 0 and Au’, Aus < 0, such that receiver 1 wants to match the 
state whereas receiver 2 does not, and assume that jig < /4,, such that at no belief 
both receivers would be willing to take a = r. Figure 3.2 illustrates sender’s indirect 
utility 6 in this case. 

Consider first the case where both receivers hold the same partition P, with |P| > 
2. In that case there is no heterogeneity in receivers understandings, meaning that they 
will form the same posterior beliefs after the realization of any message realization. 
In this case the sender is never able to induce both agents to simultaneously take her 
preferred action, and as such designs o so as to target one of the receivers optimally. 
Sender’s value in this case is given by the concave envelope of 0 evaluated at the 
prior belief [Wo. 

Now imagine that receivers differ in their understanding of the message, and 


instead hold the following partitions: 


P, = {{m1, mz}, {M3, Ma} } 
Py = {{mi}, {mz}, {ms}, {ma} } 


The sender can now exploit the audience’s heterogeneous understandings in order 
to convey some information privately to receiver 2 by choosing different conditional 
distributions to messages that belong to the same partition element of receiver 1. 
This amounts to designing a subtext (for instance the informational content of m, 
and mz) conditional on the text (the realization of {m,,mz}). 

To understand what the sender can achieve through the subtext, consider the left 


panel of figure 3.3. From proposition 9 we know that, given any posterior realization 
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Figure 3.4: The value of persuasion under different modes of communication: text 
and subtext in Black, public in red and private in blue. 


ji of the coarsest receiver, the more refined receiver can hold any distribution of 
posteriors that average to 4;. As such, for a given j,, sender can achieve through 
the subtext any value z such that (11, z) € co(¥2). An optimal subtext is then given 
by the distribution of posteriors that achieve sup{z|(jW1, z) € co(®2)}, as illustrated 
in figure 3.3. 

Knowing that the value of the subtext is given by the concave envelope of 0» 
allows us to know precisely the payoff that can be achieved through the text: for 
any belief jz; that sender generates, its value is given by the payoff it achieves from 
the coarsest agent plus the value of the subtext that is achievable under such p;. As 
such, by summing 0, with the concave envelope of t2 we obtain a function denoting 
the maximum payoff that the sender can achieve for any belief y, that it induces. 
Taking the concave envelope of this function and evaluating it at [uo tells us the value 
that the sender can achieve given the prior. 

Figure 3.4 shows a comparison of the value of persuasion under different modes 
of communication. It depicts in red the value under public communication, in blue 
the value under private communication and in black the value with text and subtext. 
A few things are worth noting: first, the value of both private communication 
and communication with text and subtext is everywhere above the value of public 
communication. This is because non-public modes of communication always have 
some probability of inducing both agents to simultaneously take the sender’s preferred 
action. Second, the value of private communication is greater than the value of text 
and subtext for uo < ff;, but both values coincide for jug > jf4;. This is so because 
when [op > jf, receiver 1 is already taking sender’s preferred action by default, such 
that she can be quiet with the text and simply target receiver 2 optimally with the 
subtext. Whenever [io < fii, however, sender needs to convey some information to 


receiver 1 in order to persuade him to choose a; = r. In doing so she makes the 
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task of persuading receiver 2 harder, as both receivers have opposite preferences and 


require different information to be convinced. 
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Appendices 


Appendix A 


Appendix for Chapter 1 


A.1 Proof of lemma 1 


Proof. Let o € & and suppose that there exist ju,’ € supp(o) with p(j) = p(p’). 


Consider the following market: 


= a4 a2 


By the convexity of X,,), p(/) = p(t). Define o’ in the following way: o'(j1) = 
o(u) + o(u’), o' (uw) = o'(u’) = 0 and o’ = o otherwise. Is it easy to check that 


yl pesupp(o) 7H) WH) = dT resupp(or) 7 (Hu) W(). We can iterate this operation as 
many times as the number of pairs v,v’ € supp(o’) such that p(v) = p(v’) to finally 


obtain the desired conclusion. 


A.2 Proof of lemma 2 


Proof. Let u* be an inefficient aggregate market, hence for any optimal segmentation 
a € U(p*), |supp(o)| > 2. Let o be a direct and optimal segmentation of u* and 
fi € supp(c) such that y is in the interior of X,,,). Let v be any other market in the 


support of 0. Consider the market: 


o(H) ___ ov) 
a(u) tov) a(n) +a(v) 


ES 


Because ju* is inefficient, it is without loss of generality to assume that € is also 
inefficient. 
Denote ji (resp. v) the projection of € on the boundary of the simplex M in 


direction of yu (resp. v). For o to be optimal, the segmentation of € between jz with 
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__ oly) 
roan must be optimal. In particular, 
it must be optimal among any segmentation on [i Vj. 


probability aot and v with probability 


There exists a one-to-one mapping f: [fi,7] — [0,1] such that for any y € [fi 7, 
y= f(iyet+(—-fl(y))v. Thus, the set [f,7] can be seen as all the distributions 
on a binary set of states of the world {ji,v}, where for any y € |fi,], f(y) is the 
probability of ji. 


Therefore, the maximization program, 


max SY) o(y)W(7) (S) 


yEsupp(o) 


st.0 € DM) = Coe A(fa,7])| SY) oly)y=& supp(c) < 00, 


yesupp(c) 


is a bayesian persuasion problem (Kamenica and Gentzkow, 2011), with a binary state 
of the world and a finite number of actions. Hence, applying theorem 1 in Lipnowski 
and Mathevet (2017), there exists an optimal segmentation only supported on extreme 
points of sets Me Ml = {M,N [f,7] | k € {1,...,K} and M,N [f,7] AO}. It 
happens that for any M ¢ M7], so that M = M;,2 [f,7] for some k, if y is an 
extreme point of M, then it is on the boundary of (M;). 

Let (u,v) with respective probabilities (a, 1—«a) be a solution to ($) where pi’ 
and v’ are extreme points of some M € Ml+’], We now consider the segmentation 
o such that a(7) = o(7) for all y © supp(c) \ {u,v}, o(w’) = (o(u) + o(v))a, 
a(v') = (o(m) + o(v))(1 — a), and o = 0 otherwise. One can easily check that 
o € U(u*). If o is not direct, that is, there exists y € supp(a@) such that (w.Lo.g.) 
p(y) = p(w’), then construct a direct segmentation o following the same process 
as in the proof of lemma 1. Then, if o is not only supported on boundaries of 


sets {Mx}rer(u*), reiterate the same process as above, until you reach the desired 


conclusion. 


A.3 Proof of proposition 2 


Proof. Fix an aggregate market p* and let 0 € U(u*) be optimal and direct. Suppose 
by contradiction that there exist ju,’ € supp(c) such that vg, := min{supp(1)} < 
max{supp(u’)} =: vq and vp := min{supp(pu’)} < max{supp(j4)} =: u.. Assume 
further, without loss of generality, that min{supp()} < min{supp(w’)}. 

Define fi := ota H+ Gy ou a5 5M. A consequence of a being optimal is that 


V (fi) = a W (H) + soa W (ul! ). The proof consists in showing that we can 


improve on this splitting of / and thus obtains a contradiction. 
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Define, for small € > 0, fi, {v’ as follows: 


ete ifk—=b 
fe=\ Ue—-e€ ifk=c 


Lk otherwise. 


=) p+ The ifk=c 
ie otherwise. 


By construction, fi = oneal + sill . Note that vq is still an optimal 
price for jt. Indeed, for any vg < vg < vy, the profit made by fixing price vz is equal 
in markets yz and fi and for any up < vz < vu, the profit made by fixing price vy, is 
strictly lower in 1 than in yp. On the contrary, @(/u’) > o(’) and it is possible that 
the inequality holds strictly. In any case, it must be that ¢(ji’) = uv, for b<e <d. 


o(H) 5) hence 24 a 


a(n) +o(u’) ow) =a" 


Denote a := 


+(1- a) ( = > Apply (Ve — Ub) — >: Apby(Up — Uo) + Acq ° €(Ue — Ve)) 

k>e b<k<e ie, 

=aery(vp — Va) — AEAC(Ve — Va) — (1 — @) ( a Arf, (Ve — Up) + ~ Arby, (VE — v»)) 
k>e b<k<e (As) 

>acrp(Vp — Va) — WEAK41(Ve — Ua) — (1 — a) ( S- April, (Ve — Vo) + by Avril, (VE — Up)) 

k>e b<k<e ne 

=aery (Up — Va) — Adar jac(ve — Uq) — (1 — a) ( >. [y(Ve — Up) + > [ly (VpE — v))| 
k>e b<k<e ee 


Finally, 


(A.7) > 0 => Bam > K 
b+1 


102 


where 


OE te — Oz) = (1 =a) ( sel Vee) +S ice OR vp) 
ae(Up — Ua) 


K= 


which ends the proof. 


A.4 Proof of proposition 3 


Proof. As argued in the core of the text, all markets with uniform price v, be- 


longing to no-rent region must be optimally segmented by splitting * between 
ite pr Huh Mig 


POS Cy Bye ey fy Oyag DD) amd pw? = (0,0,...5 00, f= ,..., 2). Such a segmen- 


tation indeed gives no rents to the monopolist if v, is an optimal price in both p* 
and yw”. That is, if: 


u-1 


=i >0j(D Eb +ys)  V2<j<u-1 (NR-s) 
=) e 
K * 

mem Tag) Vutl<j<K (NR-r) 


As such, any optimal segmentation under strong redistributive preferences that 


UL Vu 


maximizes consumer surplus must have p> = a = Fa ea g; and 2 = 
My Pu— dejar HEMI 

SE teat 
are satisfied whenever o 


which pins down segmentation a”. Conditions (NR-s) and (NR-r) 
NB is efficient, which concludes the proof. 
It is also interesting to note that conditions (NR-s) and (NR-r) define the no-rent 


region inside MM, as a convex polytope. Indeed, we can rearrange both conditions 


and get: 
j-l1 u-1 
0>-a(j) So uf+(-o(7)) Soup V2<j<u-1 (NR-s) 
i=1 i=j 
j-1 K 
Gera) 2 BU) Dew + A - 8G) Do Vutl<j<k (NR) 
TSM i=u i=j 


for a(j) = 2&4) and B(j) = 2H 


vj(Vu—01) Vu-V1)° 


The conditions expressed above define K — 2 half-spaces in R*. The no-rent 


region in MM, is thus given by the closed polytope defined by the intersection of such 


half-spaces. We can represent such polytope as follows: 


NRR, = {we My: Ap < 2}, 


103 


with 


0 RK-2 


Vapi (u—V1) 


oe 
UK (Vu-U1) 


where Og and Op are null matrices with, respectively, dimensions (u — 2) x (u — 1) 


and (K — u) x (K +1 —), and 


—a(2) 1—a(2) 
—a(3) —a(3) 

S= 
—a(u—2) —a(u—2 
—a(u-—1) —a(u-—1) 

—B(ut+1) 1-Bu+1) 
—B(ut+2)  —B(u+ 2) 
R= 
=S (1) «=p UK =) 
—B(K) —B(K) 
for a(j) = aie and 6(j) = 


R(K-u)x (K+1-u) 
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1 — a(2) 1 — a(2) 
1 — a(3) 1 — a(3) 
.  ROH-2)x(u- 1), 
l1—a(u—2) 1l—a(u—2) 
—a(u-1) l-—a(u—1) 
1—8(u+1) 1-8(u+1) 
1—Blu+2) 1-(ut2) 
€ 
1-@(K-1) 1-A(K-1) 
—B(K) 1— 6(K) 
vj (ui) 


Appendix B 


Appendix for Chapter 2 


B.1 Proof of proposition 4 


Let © be any Polish space and let A(©) be the set of probability measures on O 
endowed with its Borel c-algebra, let also C,(O) be the set of bounded continuous 
and Borel-measurable real-valued functions on ©. 

For any 7, u € A(O), by application of the Donsker-Varadhan variational formula 
(see Dupuis and Ellis, 1997, Lemma 1.4.3) we have 


Cina) = sup f pu(a.e)n(ae) in ( ff exp (ou(a,4)) u(a8)). (Ba) 


u(a,-)ECp(O) 
Taking the Legendre-Fenchel’s dual to the variational equality (B.1) (see Dupuis and 
Ellis, 1997, Proposition 1.4.2) we get 


in ( [exp (ou(a,0) wae) ) = sup f pu(a,8)n(a8)—C(nn). (B22) 


neA(O) 


Hence, we have 


Valu) = 2h ( [exp(ou(a.0) uae) | 


for any a € A, any pw € A(O) and any p € R‘. Moreover, the supremum in 


equation (B.2) is attained uniquely by the probability measure n,(u) € A(O) defined 
by 
=. Jo exp (pula, 9)) (dé) 
nau)() = 22 me 
Jo exp (pu(a, )) (a8) 


for any Borel set © (see, again, Dupuis and Ellis, 1997, Proposition 1.4.2). 


In fact, we can extend the result beyond the case of the Kullback-Leibler diver- 
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gence. Define the y-divergence between 7 and ju as 


Detnlled = fe (SH) wad), 


where y: R > R, is a proper, closed, convex and essentially smooth function such 


that y(1) = 0 and such that its domain is an interval with endpoints a < 1 < b 
(which may be finite or infinite). Let us also define the Legendre-Fenchel conjugate 


of y, denoted y*, by 
p(y) = max ry — y(z) 


for any y € R. Then, the following proposition holds. 


Proposition 11. Receiver’s belief motivated by action a under posterior uniquely 


#! (SEO) = puta.) 


for any0 € 0, anya € A and any pw € A(O), while Receiver’s optimal psychological 
payoff equals 


satisfies 


W.(u) = ; i y* (pula, 6)) 4(d9), 


for anya € A and any pw € A(O). 


Proof. This proposition is a direct application of Theorem 4.4 in Broniatowski and 
Keziou (2006). 


B.2 Overoptimism about preferred outcomes 


Fix an a € A and let ©, be the (measurable) set of states such that 0, = 
arg Maxgee u(a,A). Define 6(a,0) = u(a,6) — u(a,6*) for all 9 and some 6* € Qu. 
Remark that 7,(1)(Q,) can be expressed as follows: 


| exp (pu(a,4)) 44(d8) 


a 


nal}#)(@a) = 
j- exp (pu(a, 8) (dé) 


7 1(Oa) . 
(On) + [ a, sama) (00) 
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Let’s define the function 


(Oa) 


h(p) = 
ness: | exp (5(a,8)) 1(d9) 
©\0, 


for any p € R4. 


First, remark that h(0) = ~(©,). Moreover, by Leibniz integral rule, we have 


h'(p) = —p(Ba) >0 
[ gts 8) ev (p80. 8)) n(28) 


for any p € R4, since d(a,@) < 0. Finally, we also have that lim, .,.h(p) = 1. 
Hence the probability of payoff maximizing states is bounded below by the Bayesian 
posterior j4(©,), is always increasing and is converging to 1 from below. Hence, a 
wishful Receiver always puts more probability mass on ©, than a Bayesian and 
eventually believes that the state belongs to Q, with probability 1 when p becomes 


large. 


B.3 Proof of lemma 3 


Let us study the properties of the belief threshold jz“ as a function of p and payoffs. 


First of all, let us define the function 


_ exp(puo) — exp(puy) 
exp(pUp) — exp(pu;) + exp(pti1) — exp(pto) 


for any p € R‘.. To avoid notational burden, we omit the superscript W in the proof. 


We can find the limit of u(~) at 0 by applying l’Hépital’s rule 


lim p(p) = lim Up ExP(PUg) = Uy exp(PUy) 


p-+0 p->0 Ug ExP(PUp) — Uy eExp(pu;) + Ty exp(pU1) — Uo exp(PUo) 
Ug — Uy 
Up — Uy + U1 — Uo 


So, we are back to the case of a Bayesian Receiver whenever the cost of distortion 


becomes infinitely high. After multiplying by exp(—puy) at the numerator and the 
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denominator of ju(p) we get 


7 1 — exp(p(u, — Uo)) 
1 — exp(p(u; — Up)) + exp(p(t, — Up)) — exp(p(Tio — Up)” 


L(p) 


So the limit of 4” at infinity only depends on the sign of 7, — up as, by assumption, 
Uy; — Uy < 0 and Up — uy < 0. Hence, lim, ,+. u(p) = 1 when @% — up < 0 and 


limp++o0 (p) = 0 when % — up > 0. Finally, in the case where uy = % we have 


1 — exp(p(uy — Uo)) 


lim = lim 
Ae = set =) See Sa) 
1 
at 


Let us now check the variations of the function. After differentiating with respect to 
p and rearranging terms, one can remark that the derivative of (op) must verify the 


following logistic differential equation with varying coefficient 


L'(p) = a(p)u(p)(1 — up), 


where 


op) = 2 exp(pu) — ui exp(pu;) — Ui exp(pui 
exp(PUy) — exp(puy) exp(pUy 


) = Up exp(pti) 
) — exp(pto) 


for all p € R*., together with the initial condition (0) = 4”. Hence, a completely 
dictates the variations of y(p). Let us study the properties of the function a defined 


on R*. First, still applying again |’Hopital’s rule, its limits are given by 


Up — Uy — (Uy — Uy) 


mole) = =O, 


and 


= Umax: 
Second, after rearranging terms, its derivative is given by 


Tage (Uy — Uy)” - (w1 — Wo)? 
oP) = cosh(olup =) = 1 cosh(p(@a = %)) = 1 


108 


for any p € R*, where cosh is the hyperbolic cosine function defined by 


cosh(xz) = os 


for any x € R. Remark that the function defined by 


2 


= cosh(px) — 1 


(B.3) 


is strictly decreasing on R*. So, we have a‘(p) < 0 and therefore yu strictly 


decreasing for all p € R* if and only if ug — u, > U; — Up. Accordingly, a is always a 


strictly monotonic function if and only if up # UW, and U F u,. Hence, excluding the 


extreme case where up = U and UW = u, so a’(p) = 0 and p(p) = p? for all p € R%, 


three interesting cases arise, all depicted on figure B.1 for different payoff matrices: 


(i) If Umax < 0, function a has a constant sign for any p € | 


in which case p™ is strictly decreasing from pi? to 0. 


R* if and only if uo < uw, 


In case up > Uy, @ has a 


varying sign so yu" starts from pz? and is sequentially strictly increasing and 


strictly decreasing toward 0. 


(ii) If Umax = 0, function a has a constant sign for any p € R},. In this case uu is 


strictly increasing from yw? to 1/2 if and only if uo > uy. 


(iii) If Umax > 0, function a has a constant sign for any p € 


in which case yz" is strictly increasing from py? to 1. 


R* if and only if up > u1, 


In case up < uy, a has a 


varying sign so ™ starts from pz? and is sequentially strictly decreasing and 


strictly increasing toward 1. 


Accordingly, in case 1p“ is non-monotonic in p, there alway: 


that u™(p) = w®. This concludes the proof. 
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s exists some p > 0 such 


— uo > u, and u > 
--- Ug <u, and u > 
ug <u, andu<wU %, 


lel el 
: 


Umax T 


p 
(a) Functions a and wp“ when umax < 0. 


a ad 


AT le 
N --- Ug > UW 
: — Uo < Wy 


Umax Sm 
- p 0 > p 
0 0 
(b) Functions a and np“ when umax = 0. 
a yr 
lx 


Umax | 


---ug > u, and u>t 
---Ug > uj andu<u 
—up <u, andu<W@ 


(c) Functions a and p™ when umax > 0. 


Figure B.1: Functions a and a for different payoff matrices (u®)a0c Axe. Action 
a= 1 is favored by a wishful Receiver whenever up" < p?. 
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B.4 Proof of proposition 5 


Assume || = n where 2 < n < oo. We want to show that A? c A if, and only 
if, the payoff matrix (u(a, @))(a9)eAxo and the wishfulness p verify at least one of 


property (i), (ii) or (iii) in lemma 3 for every pair of states 0,6’ € O. 


Extreme point representation for A? and AW’. First, remark that A? and 
AW are both convex polytopes in R'®! defined by 


AB = A(O)N ! ERM! : Vai € A, ula, 0)u(0) > Y we. ; 


dcO dcO 


and 


AW — ayo» ERI: va € A, S exp (pu(a,6)) 1(0) > S— exp (pu(a’, 6)) uo} 


dcO dco 


The sets A? and A? are thus compact and convex sets in R!®! with finitely many 


extreme points. Let us now characterize the sets of extreme points of AP and Al’. 


For any ps € R'®!, define the systems of equations 


A’ -w=b, =O 


and 
A” -w=b, p>0 
where 
ae u?(0,) ... u?(O,) 
1 _ 1 
and 
AB = u (61) wee U 
1 see 
°(6) \(L¢ 


= u(1, 6) — u(0, 6) and ul” (8) = exp(pu(1,4)) — 


are 2 X n matrices, where u 


exp(pu(0,@)) for any 0 € ©, and 
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In what follows, we always assume that (w?(@))gcq and (u™(0))oce are such that 


rank(A®) = rank(A™) = 2.' Let us recall some mathematical preliminaries. 

Definition 7 (Basic feasible solution). Let 6,0’ € © be any pair of states. A vector 
u* is a basic feasible solution to AP? -=b (resp. AY -=b), > 0, for 0,0 if 
A®. u* =b (resp. AW - p=b), p*(0), u*(6") > 0 and u*(0") = 0 for any 0” 40,0". 


Lemma 5 (Extreme point representation for convex polyhedra). A vector p € R'®! 


is an extreme point of the convex polyhedron AP (resp. A®) if, and only if u is a 
basic feasible solution to AP -w=b, w>0 (resp. AY - pp =b, pp > 0). 


Proof. See Panik (1993) Theorem 8.4.1. 


Therefore, to find extreme points of A?, we just have to solve the system of 
equations 
p()u? (8) + 1(0/)b(6") = 0 
uO) + WG) = 1 (B.4) 
(8), w(8') = 0 
for any pair of states 0,6’. When either (6) = 0 or (6’) = 0, the solution to (B.4) 
is given by the Dirac measure 6g only if u?(@) > 0. Denote €? the set of such beliefs. 
The set €? then corresponds to the set of degenerate beliefs under which a Bayesian 
Receiver would take action a = 1. Now, if (0), 4(0’) > 0 then the solution to (B.4) 


is given by (0, 6”) — u(1,6 
pe — ? ? : 
6 ~~ u(0, 6’) — u(1, 6’) + u(0, @) — u(1, 4) 


Such a belief is exactly the belief on the edge of the simplex between dg and dy 


at which a Bayesian decision-maker is indifferent between action a = 0 and a = 1. 


Denote Z? the set of such beliefs. Hence, we have 
ext(Ay=er ur". 


Following the same procedure, the set of extreme points of AW is given by E/” UZ", 
where €}" is the set of degenerate beliefs at which u (0) > 0 and Z™ is the set of 
beliefs 


_ exp(pu(0, 4')) — exp(eu(1, 4)) 
exp(pu(0, 6’) — exp(pu(1, 6’)) + exp(pu(0, @)) — exp(pu(1, 4)’ 


H5.6(P) 
for any 6,6’ € ©. Now, applying Krein-Milman theorem, we can state that 


A? = co (EP UZ") 


'This amounts to assuming that payoff are not constant across states. 


8 


and 
AY =a Ua") 


Sufficiency. Assume the payoff matrix (u(a,4))(a6)eaxe and the wishfulness p 
verify at least one of property (i), (ii) or (iii) in lemma 3 for every pair of states 
0,0' € ©. Therefore, we have Lipg (p) > Lb for any 0,6’ € ©. This implies 
TP c AY, since action a = 1 is favored by a wishful Receiver on each edge of the 
simplex. Moreover, it is trivially satisfied that €? = €]”. Hence, since any point in 
A? can be written as a convex combination of points in EP UZ? c A", it follows 
(hapa Cc AY 


Necessity. Assume now that A? Cc Aj’. Therefore, we have puj'g/(p) > bg for 
any 0,0" € © which implies that (u(a,@))(aeyeaxo and the wishfulness p verify at 


least one of property (i), (ii) or (iii) in lemma 3 for every pair of states 0,0’ € O. 


B.5 Proof of proposition 8 


First, note that we can always index the voters in an ascending order of 6, such that 


n(, 3°) > nj (1) for all yp € A(O) whenever i < 7, such that 
—-l n 
-S pa ut; BY) — nu, 8") 
t=1 j=i4+1 


does indeed represent the absolute difference between each pair of beliefs. Now, 


remark that the sum can be rearranged in the following way: 


m() = Y a n(t, B°) — n(u, 8’) 


=(n — 1)n*(p) + (n — 2)? (4) — 9? (ue) + 
os nl 8") - nnn pa ee 
n( i, B"-*) — (nm — 2)n(u, 8°") — (n — 1)y"(u) 


=) (n+ 1— 24) (n(u, 6") — n(u, 6"), 
for any ys € [0,1], where m = (n+ 1)/2. That is, we can express it in terms of the 


differences in beliefs among voters who are equidistant from the median. To see 


that this is true, we need to first realize that each belief appears n — 1 times in 
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equation (2.6) (since each belief is paired once with each of the other n — 1 beliefs). 
The beliefs of voters below the median appear more often as positive than negative 
(the belief of the first voter is positive in all of its pairings, the belief of the second 
voter is positive in all of its pairing except for the pairing with the first voter, etc.), 
whereas the beliefs of voters above the median are more often negative than positive. 
If we rearrange the terms of the sum in order to pair symmetric voters, the term 
(n(u, B+) —mn({2)) appears n —1 times, whereas the term (n2(~2) — (yu, 8"~+)) appears 
n — 3 times, since out of the n — 1 times 72(j/) appears on equation (2.6), n — 2 of 
them are positive and 1 is negative (the converse is true for (ju, 8”~')). One can 
continue the same reasoning for all the pairs of symmetric voters, and get to the 
formulation of 7(j) presented above. Note, also, that the belief of the median voter 
is summed and subtracted at the same rate, such that it does not matter in our 
measure of polarization. 

Consider the distance between beliefs of any pair of symmetric voters 7(j1, 6") — 
n(u, B"*+*) for i € {1,...,m}. Given our symmetry assumption these two agents 
are symmetric, such that 6’ = 1— 8"*'*. It is not difficult to show that any of those 
pairwise distances is maximized when agent 7 is distorting its belief upwards and agent 
n+1-—1 is distorting its belief downwards. That is, when pu € [u™ (8°), uw (B"41-*)]. 

First, the distance between symmetric beliefs in such an interval can be rewritten 


as 


‘y nti-iy__ __ exp (p8") li 
eB) Mt) = exp(pBt) + (1) w+ expe) 


for any i € {1,...,m} and pe [u” (6°), wp (8741-4). 

Second, by taking the first order condition in this interval and rearranging it we 
get 

y+ (1 — 4) exp(p8") 

pexp(pB?) + (1 — p) 


such that the difference between symmetric beliefs is maximized uniquely at 


o] 


_ ,_ WwW A 
w=p (B Le 


for any i € {1,...,m}, 8’ €]0,1[ and any p € R*. Since 


ui (B™) = arg max n(p, 8°) — n(, B"T) 
peE(0,1] 
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for any i € {1,...,m}, we get 


p(B") = arg max 7(1), 
pe [0,1] 


which concludes the proof. 


B.6 Proof of proposition 7 


First, we define the function 


1 0 
v2) = ToR | ewlomstoras, 


TFG) 


for any z € [9,@[ and adopt the convention that u(@) = exp(p0). It is not difficult to 
show that ~ is a continuous and strictly increasing function from ~(@) = << 1 to 


w(@) = exp(p0). Define similarly the function 


= To fH ae, 


for any z € (0, [ and (0) = 0. Again, it is not difficult to show that y is a continuous 
and strictly increasing function from y(@) = m <0 to y(0) = 8. 

Since w is strictly increasing, it thus suffices to show that (6?) > 1 = (0) to 
prove that @Y < 9%. Applying Jensen’s inequality, it follows that 


w(z) > exp(py(z)), 


for any z € ]0,6[, where the strict inequality comes from the strict convexity of 
z+ exp(pz) and the non degeneracy of F’. In particular, Jensen’s inequality holds 
with equality at @ and 6, but, by the intermediate value theorem, it must be that 0? 


(as well as 6”) lie in the open interval |9,0[. Thus, we have 
vO") > 1, 


since y(0?) = 0 and 6? 4 6,0. 
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Appendix C 


Appendix for Chapter 3 


C.1 Proof of lemma 4 


Proof. We can denote the posterior belief of receiver i after observing a message 


m € p;(m) as: 


Colpsomy a 20M) sy) = ais) 
paleleo)) = —ceptmyy 1) = De atamyyO) 


since the set {p;|p; C pi(m)} must completely cover the set p;(m) when P; is a 
refinement of P;. Since p;(w|p;)o(p;) = o(p;|w)uo(w), it follows that: 


peop) = SPD swtp,) 


C.2 Proof of proposition 10 


Proof. From proposition 9 we know that the distribution of ju; impacts sender’s 
payoffs not only by defining the distribution of actions of agent 7, but also by 
constraining the distribution of any yu; for 7 > 7. One then needs to “backwards 
induct” on the beliefs in order to determine the proper value of [4;. 

From corollary 2 of Kamenica and Gentzkow (2011) and proposition 9 of this 


paper we know that given any [4,_1, the value of receiver n’s subtext for the sender 
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must be given by sup{z|(fn—1,z) € co(tn)} = Vn(fn—1), such that the value of 
inducing a particular belief w,_; should be @)_1(Un—1) + Va(Un—1)- 

Given that, one could compute the value of n — 1’s subtext given any [y_2 
as sup{2z|({n—2, 2) € Cco(On-1 + Vn)} = Vn-i(Mn—2). Recursing the argument until 


receiver V;(ju9) obtains the proof. 


daly 
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